- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
ENH: making value_counts stable/keeping original ordering #39009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| There are still some issues that needs to be addressed/discussed: 
 cpdef stable_value_count_object(ndarray[object] values, bint dropna, na_representant=np.nan):
    ...
    for i in range(n):
        val = values[i]
        if checknull(val):
            val = na_representant  # uses one representant for all null-objects
        if not checknull(val) or not dropna:
            k = kh_get_{{ttype}}(table, <PyObject*>val)
            ...this would be slightly faster than the current approach, would work also for  
 | 
| surprising asv_bench showed no changes when ran with:  | 
6aa9652    to
    8718976      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good a few comments. pls add a whatsnew note in bug fixes reshaping section. also if this is closing multiple issues (iir also the 32-bit ordering one?) then indicate them there as well.
| result_keys = {{dtype.title()}}Vector() | ||
| {{else}} | ||
| build_count_table_{{dtype}}(values, table, dropna) | ||
| result_keys = {{'U'+dtype[1::].title()}}Vector() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just too clever, changed to {{name} here: 9a17c13
8718976    to
    0966c2b      
    Compare
  
    0966c2b    to
    e64b121      
    Compare
  
    | I went on with 2. from comment #39009 (comment) I think 1. and 3. from this comment are better as separate PRs. | 
e64b121    to
    e532854      
    Compare
  
    | can you merge master. | 
e532854    to
    d480b7c      
    Compare
  
    | https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=52863&view=logs&j=acc1347f-8235-55b0-95c7-0e2189e61659&t=486f34f1-eda6-516a-fc5d-d8195128dacd i think you need to update this test (was added since you did this PR originally) | 
| @jreback Thanks, fixed now | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm minor comments. ping on green
        
          
                doc/source/whatsnew/v1.3.0.rst
              
                Outdated
          
        
      | - Bug in :func:`join` over :class:`MultiIndex` returned wrong result, when one of both indexes had only one level (:issue:`36909`) | ||
| - :meth:`merge_asof` raises ``ValueError`` instead of cryptic ``TypeError`` in case of non-numerical merge columns (:issue:`29130`) | ||
| - Bug in :meth:`DataFrame.join` not assigning values correctly when having :class:`MultiIndex` where at least one dimension is from dtype ``Categorical`` with non-alphabetically sorted categories (:issue:`38502`) | ||
| - ``value_counts`` returns keys in original order (:issue:`12679`) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:meth:`Series.value_counts`, can you add the other issue that is being closed here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done d85b1aa
| is_null = checknull(val) | ||
| if is_null: | ||
| val = navalue | ||
| if not is_null or not dropna: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it could work with elif (but I'm always struggeling with logic). I have rearanged the if and hope it is clearer now: 230215f
| @jreback green | 
| thanks @realead | 
closes #12679
closes #11227
The order of the returned keys for
value_countsaren't arbitrary (i.e. depending on the used hash function) but are original ordering (when sorted this applies for the keys with the same number of values).