Skip to content

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Dec 4, 2021

  • Ensure all linting tests pass, see here for how to run them

Followup to #44358. Both reverts to the tests were from #44449. Will post asvs.

ASV
       before           after         ratio
     [aee6b7ab]       [88928c06]
     <DataFrameGroupBy.value_counts~1^2>       <stacklevel_reverts>
+     1.54±0.03ms       2.80±0.2ms     1.81  algorithms.Factorize.time_factorize(True, False, 'Int64')
+         553±4μs         671±40μs     1.21  arithmetic.NumericInferOps.time_divide(<class 'numpy.int8'>)
+     12.1±0.05ms      14.2±0.04ms     1.17  strings.Methods.time_isspace('str')
+     20.2±0.05ms       23.5±0.2ms     1.16  groupby.Nth.time_series_nth_any('object')
+     2.49±0.02ms      2.86±0.01ms     1.15  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'float32')
+        311±10ns          352±9ns     1.13  index_cached_properties.IndexCache.time_inferred_type('Int64Index')
+      17.9±0.1ms       20.2±0.1ms     1.13  groupby.Apply.time_copy_function_multi_col(5)
+        442±20ns         494±30ns     1.12  index_cached_properties.IndexCache.time_is_monotonic_increasing('RangeIndex')
+      1.40±0.1μs       1.56±0.1μs     1.11  index_cached_properties.IndexCache.time_values('UInt64Index')
+        489±20ns         543±20ns     1.11  index_cached_properties.IndexCache.time_shape('Int64Index')
+      3.34±0.2μs       3.68±0.3μs     1.10  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
-     1.50±0.02μs      1.37±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2000, tzlocal())
-     1.41±0.02μs         1.28±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 12000, datetime.timezone.utc)
-     1.52±0.01μs      1.38±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 5000, tzlocal())
-     1.51±0.03μs      1.37±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 4006, None)
-     1.51±0.01μs      1.37±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4006, tzlocal())
-     1.41±0.02μs      1.28±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 4000, datetime.timezone.utc)
-     1.52±0.03μs         1.38±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 12000, tzlocal())
-     1.50±0.05μs         1.36±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1011, None)
-     4.55±0.01μs      4.12±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 3000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     4.54±0.01μs      4.12±0.03μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 12000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.42±0.02μs      1.29±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1011, datetime.timezone.utc)
-     1.53±0.01μs      1.38±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 12000, tzlocal())
-      4.54±0.1μs      4.12±0.05μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 5000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     4.56±0.04μs      4.13±0.05μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 11000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-      1.48±0.1μs      1.34±0.03μs     0.91  index_cached_properties.IndexCache.time_shape('PeriodIndex')
-     4.52±0.07μs      4.10±0.01μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 9000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.51±0.04μs         1.36±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 9000, None)
-     1.40±0.03μs         1.27±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2011, datetime.timezone.utc)
-     1.40±0.03μs         1.27±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, datetime.timezone.utc)
-        1.41±0μs      1.28±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4006, datetime.timezone.utc)
-     1.40±0.03μs      1.27±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 8000, datetime.timezone.utc)
-     1.41±0.02μs         1.28±0μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 9000, datetime.timezone.utc)
-     1.51±0.03μs      1.36±0.02μs     0.91  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 8000, None)
-     1.51±0.05μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 8000, None)
-     1.51±0.02μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 11000, None)
-      1.10±0.2μs         998±70ns     0.90  index_cached_properties.IndexCache.time_inferred_type('TimedeltaIndex')
-     1.53±0.04μs      1.38±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 9000, None)
-     1.51±0.03μs         1.36±0μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 2000, None)
-     1.52±0.02μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 11000, None)
-     1.51±0.03μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 2000, tzlocal())
-     1.42±0.03μs      1.28±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 7000, datetime.timezone.utc)
-     1.41±0.02μs      1.27±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, datetime.timezone.utc)
-     1.51±0.03μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2000, None)
-     1.52±0.03μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 8000, tzlocal())
-     1.51±0.02μs      1.37±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1000, tzlocal())
-     1.51±0.05μs         1.36±0μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 3000, None)
-     1.52±0.03μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, None)
-     1.52±0.03μs      1.37±0.03μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1011, tzlocal())
-     1.51±0.03μs      1.36±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 6000, tzlocal())
-     1.53±0.03μs      1.38±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 2011, tzlocal())
-     1.41±0.03μs      1.28±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1000, datetime.timezone.utc)
-     1.51±0.03μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 12000, None)
-     1.51±0.03μs         1.36±0μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 3000, tzlocal())
-     1.52±0.02μs      1.37±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 4006, tzlocal())
-     1.50±0.02μs      1.35±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, None)
-     1.52±0.05μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1000, None)
-     1.52±0.01μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 10000, tzlocal())
-      4.57±0.2μs      4.12±0.03μs     0.90  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.42±0.01μs      1.28±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 5000, datetime.timezone.utc)
-     1.53±0.01μs      1.38±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 11000, tzlocal())
-     1.40±0.02μs      1.26±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 5000, datetime.timezone.utc)
-     1.51±0.05μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4000, None)
-     1.51±0.03μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1011, None)
-     1.42±0.02μs      1.27±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1011, datetime.timezone.utc)
-     1.53±0.03μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 7000, tzlocal())
-     1.52±0.03μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 9000, tzlocal())
-     1.52±0.02μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4000, tzlocal())
-     4.63±0.03μs      4.16±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.53±0.03μs      1.38±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 5000, tzlocal())
-      4.59±0.2μs      4.12±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
-     1.51±0.04μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 5000, None)
-     1.52±0.02μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 11000, tzlocal())
-     1.53±0.03μs      1.37±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 9000, tzlocal())
-     1.51±0.02μs      1.36±0.01μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 5000, None)
-     1.52±0.05μs      1.37±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 1000, None)
-     1.53±0.01μs      1.37±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1011, tzlocal())
-     1.53±0.01μs      1.37±0.02μs     0.90  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 3000, tzlocal())
-     1.52±0.06μs      1.36±0.01μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 12000, None)
-      2.73±0.2μs       2.44±0.1μs     0.89  index_cached_properties.IndexCache.time_shape('CategoricalIndex')
-     1.53±0.04μs      1.37±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 1000, tzlocal())
-     1.54±0.02μs      1.37±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 7000, tzlocal())
-     1.53±0.03μs      1.36±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 6000, tzlocal())
-     1.51±0.02μs      1.35±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 3000, None)
-     1.54±0.03μs      1.37±0.01μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 7000, None)
-     1.54±0.02μs      1.37±0.01μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2011, tzlocal())
-      3.63±0.2μs       3.23±0.2μs     0.89  index_cached_properties.IndexCache.time_engine('UInt64Index')
-     1.53±0.05μs      1.36±0.01μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 2011, None)
-     1.53±0.01μs      1.36±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 10000, tzlocal())
-      2.00±0.2μs       1.77±0.2μs     0.89  index_cached_properties.IndexCache.time_inferred_type('IntervalIndex')
-     1.54±0.04μs      1.37±0.02μs     0.89  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(0, 4000, tzlocal())
-     12.2±0.08ms      10.8±0.08ms     0.88  indexing.InsertColumns.time_insert
-        9.86±1μs      8.71±0.09μs     0.88  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BYearBegin: month=1>)
-     1.54±0.06μs      1.36±0.02μs     0.88  tslibs.period.TimeDT64ArrToPeriodArr.time_dt64arr_to_periodarr(1, 4006, None)
-        336±30ns        296±0.3ns     0.88  dtypes.Dtypes.time_pandas_dtype(dtype('uint16'))
-     1.43±0.01μs         1.25±0μs     0.87  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1, datetime.timezone.utc)
-      1.82±0.1μs      1.58±0.01μs     0.87  tslibs.normalize.Normalize.time_normalize_i8_timestamps(100, None)
-     1.43±0.01μs      1.24±0.01μs     0.87  tslibs.normalize.Normalize.time_normalize_i8_timestamps(0, datetime.timezone.utc)
-     1.60±0.08μs      1.33±0.02μs     0.83  tslibs.normalize.Normalize.time_normalize_i8_timestamps(0, None)
-     1.60±0.05μs      1.33±0.01μs     0.83  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1, None)
-     1.61±0.07μs      1.34±0.01μs     0.83  tslibs.normalize.Normalize.time_normalize_i8_timestamps(1, tzlocal())
-     1.64±0.06μs      1.33±0.01μs     0.81  tslibs.normalize.Normalize.time_normalize_i8_timestamps(0, tzlocal())
-     7.28±0.01ms      5.88±0.06ms     0.81  groupby.Categories.time_groupby_ordered_nosort
-     6.74±0.04ms      5.42±0.03ms     0.80  series_methods.ValueCounts.time_value_counts(10000, 'object')
-      11.3±0.3ms      9.02±0.02ms     0.80  groupby.MultiColumn.time_col_select_numpy_sum
-     7.37±0.02ms      5.85±0.01ms     0.79  groupby.Categories.time_groupby_nosort
-     12.5±0.05ms      9.88±0.02ms     0.79  groupby.MultiColumn.time_cython_sum
-      11.0±0.3ms       8.68±0.1ms     0.79  io.csv.ReadCSVConcatDatetimeBadDateValue.time_read_csv('')
-     6.97±0.06ms      5.43±0.01ms     0.78  groupby.Categories.time_groupby_extra_cat_nosort
-      13.8±0.6ms       10.5±0.3ms     0.76  io.csv.ReadCSVConcatDatetimeBadDateValue.time_read_csv('nan')
-     5.26±0.04ms      3.96±0.03ms     0.75  series_methods.ValueCountsObjectDropNAFalse.time_value_counts(10000)
-     2.86±0.02ms      1.68±0.02ms     0.59  series_methods.Map.time_map('dict', 'object')
-     2.63±0.04ms      1.38±0.02ms     0.52  series_methods.Map.time_map('Series', 'object')
-     1.82±0.01ms          682±4μs     0.38  series_methods.ValueCounts.time_value_counts(1000, 'object')
-     1.65±0.01ms          529±3μs     0.32  series_methods.ValueCountsObjectDropNAFalse.time_value_counts(1000)
-      10.9±0.5ms       2.62±0.2ms     0.24  multiindex_object.Integer.time_get_indexer
-       163±0.4ms       30.6±0.2ms     0.19  groupby.GroupByMethods.time_dtype_as_field('uint', 'describe', 'direct', 5)
-       160±0.5ms       28.8±0.1ms     0.18  groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'direct', 5)
-       160±0.1ms       28.7±0.1ms     0.18  groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'direct', 5)

@rhshadrach rhshadrach added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Warnings Warnings that appear or should be added to pandas labels Dec 4, 2021
@rhshadrach rhshadrach added this to the 1.4 milestone Dec 4, 2021
@jreback jreback merged commit f9ecd53 into pandas-dev:master Dec 4, 2021
@jreback
Copy link
Contributor

jreback commented Dec 4, 2021

thanks @rhshadrach

@TomAugspurger
Copy link
Contributor

Thanks for the follow-up on this @rhshadrach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Warnings Warnings that appear or should be added to pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants