Skip to content

Conversation

@dhirschfeld
Copy link
Contributor

The following shim modules are included:

  • pandas.computation
  • pandas.tools.hashing
  • pandas.types

Fixes #16138

…bility

The following shim modules are included:
  - `pandas.computation`
  - `pandas.tools.hashing`
  - `pandas.types`

Fixes pandas-dev#16138
@dhirschfeld
Copy link
Contributor Author

My testing of the 0.20.0 release candidate uncovered a large number of breakages in both dask and odo.

I completely understand the need for refactoring but I really don't think it's good practice to simply break non-deprecated apis without any warning especially with a library as far down the stack as pandas. I think this is especially true when it's a fairly simple task to provide a shim module with a warning to at least give 3rd parties a chance to make the necessary changes.

This PR provides the necessary shims for me to get both dask and odo working.

@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

absolutely not
this is all non public

@dhirschfeld
Copy link
Contributor Author

How were 3rd party libraries supposed to determine that they were all private modules?

@dhirschfeld dhirschfeld mentioned this pull request Apr 26, 2017
@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

none of these were ever in the top level exposed nor api namespaces which was announced in 0.19.0

if you are using it then it is by definition unstable

if packages didn't follow the guidelines then they are out of luck

@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

i would take this only for the change in hashing

@TomAugspurger
Copy link
Contributor

@dhirschfeld can you give a sense of how much odo was relying on these subpackages? e.g. a CI run with test failures?

@dhirschfeld
Copy link
Contributor Author

dhirschfeld commented Apr 26, 2017

That may be the case, but when the api namespace was announced those modules should have been deprecated so that existing code would give a warning that it was deprecated.

You can say that everyone should have known but the reality is that code which worked on 0.19.2 and gave no deprecation warning will simply break with this release.

No one likes maintaining useless code but if it prevents users code from simply breaking without warning I don't think it's too great of a burden to do so for a short period whilst 3rd party libraries adjust.

With this PR pandas complains loudly about deprecated usages:

C:\Python\lib\site-packages\dask\dataframe\methods.py:6: FutureWarning: The pandas.types module is deprecated and will be removed in a future version. Please import from the pandas.core.dtypes module instead.
  from pandas.types.concat import union_categoricals
C:\Python\lib\site-packages\dask\dataframe\hashing.py:8: FutureWarning: The pandas.lib module is deprecated and will be removed in a future version. These are private functions and can be accessed from pandas._libs.lib instead
  from pandas.lib import is_bool_array
C:\Python\lib\site-packages\dask\dataframe\hashing.py:32: FutureWarning: The pandas.types module is deprecated and will be removed in a future version. Please import from the pandas.util.hashing module instead.
  from pandas.tools.hashing import hash_pandas_object

@dhirschfeld
Copy link
Contributor Author

@TomAugspurger - I think the only issue in odo was the use of NaTType fixed in #16137.

PR #15998 did provide a shim for that module with a deprecation warning but it also simultaneously broke the previous api hence rendering the shim/warning ineffective

This PR effectively does the same thing for the other moved modules.

@dhirschfeld
Copy link
Contributor Author

At the end of the day, it's your call - I just got a nasty surprise when trying the rc.

It might just be dask/odo which were using the private/deprecated modules but with 3rd party code there's really no way to tell. So long as dask gets out a release before pandas maybe it'll be fine...

If you are considering providing the shims I can fix up the failing test_api in this PR.

@shoyer
Copy link
Member

shoyer commented Apr 26, 2017

The reality is that the exact public API for pandas has been ambiguous. It's great that we're finally clearing this up, but there's no prize for needlessly breaking downstream libraries, and doing this without even a deprecation warning seems needlessly punitive -- especially when there is literally no cost to maintaining this for now.

That said, there is no place for deprecation warnings that suggest switching to import modules in pandas.core, the entirety of which is private API. The messages should instruct users to switch to supported public APIs.

@dhirschfeld
Copy link
Contributor Author

That said, there is no place for deprecation warnings that suggest switching to import modules in pandas.core, the entirety of which is private API. The messages should instruct users to switch to supported public APIs.

Sure, if there's an appropriate public api they should be pointed to that. I just wasn't clear in pandas where that line is drawn. I can appreciate though that there is an effort to document it and make it clearer - e.g. #16087

@jorisvandenbossche
Copy link
Member

Indeed, all this effort is to try to make the distinction between public and private functionality clearer, to prevent such problems in the future.

We already added some shims (eg for lib, tslib, json, parser, tools.merge, tools.plotting), but based on feedback we can certainly a) expand the shims and b) discuss whether some functionality should be exposed in a public location (eg NaTType in #16137)

It might be the case that dask is a bit a special case in that they used new, private functionality. But it is difficult to be sure about that. I tried using github search to look for repo's that currently use private functionality, but that github search is almost completely useless as I 'find' all the (huge amount of) repo's that have vendored the full pandas codebase .. So if people know better ways to search for this, all ears!

@TomAugspurger
Copy link
Contributor

Sure, if there's an appropriate public api they should be pointed to that. I just wasn't clear in pandas where that line is drawn.

I think Stephan was talking specifically about union_categoricals, since your current message redirects people to pandas.core. The new, public, home of union_categoricals is pandas.api.types

@dhirschfeld
Copy link
Contributor Author

That's fine for union_categoricals but how about the computation module which is now under core and AFAICS there's no public exposure for any of the function in it - e.g. pd.computation.expressions.set_use_numexpr? Should the warning say the module is depricated and will be made private in future?

@jorisvandenbossche
Copy link
Member

I think Stephan was talking in general (or at least, that is my point of vue). So in general, the deprecation messages should just say that the module is deprecated and will be removed in the future, without any alternative import location. Only for some specific functions (like union_categoricals) we should add a custom deprecation warning pointing to the public import path.

I added a comment in #13634 (comment) to discuss whether we want to expose some of those publicly (I think set_use_numexpr is the main one to discuss)


warnings.warn(
"The pandas.computation module is deprecated and will be removed in a future "
"version. Please import from the pandas.core.computation module instead.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the "Please import from the pandas.core.computation module instead." should be removed IMO.


warnings.warn(
"The pandas.types module is deprecated and will be removed in a future "
"version. Please import from the pandas.core.dtypes module instead.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, the "Please import from the pandas.core.dtypes module instead." should be removed.

There is part of these functions available in pandas.api.types, but not everything. But maybe we can point to that in general?

from pandas.io.api import *
from pandas.util._tester import test
import pandas.testing
from . import computation # TODO: Remove when deprecated shim module is removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will trigger the deprecation warning already (so showing it on just importing pandas), so we have to have a workaround this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can possibly be solved by using the _DeprecatedModule some lines below

@dhirschfeld
Copy link
Contributor Author

@jorisvandenbossche - thanks for the review I can take a look at pushing the suggested fixes in the morning.

@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

what project is using this? pd.computation.expressions.set_use_numexpr

this is also a private API. It should not be exposed at all. But I can see a config option to do this instead.

everything else that is listed here I am not inclined to deprecate at all. This is just way to much reaching in by external users. This is the entire point of the announcement in 0.19.0. types is in no way public at all. Certain long time 'public' modules sure are ok to deprecate (but as you can see with very narrow focus as to what is deprecated, everything else is simply dropped).

if there are specific cases I am all ears.

@jorisvandenbossche
Copy link
Member

pd.computation.expressions.set_use_numexpr. this is also a private API. It should not be exposed at all. But I can see a config option to do this instead.

Which means exposing it? But if we want to expose the ability to set the use of numexpr, an option indeed seems like the way to go (if that has no perf penalties). Eg pandas.options.use_numexpr or pandas.options.mode.use_numexpr (we could possibly do the same for bottleneck, not sure if that would be used apart from testing)

@jorisvandenbossche
Copy link
Member

Certain long time 'public' modules sure are ok to deprecate

@jreback I think pandas.computation (for which @dhirschfeld provided a shim in this PR) classifies as such a long-time 'public' module?

@jreback
Copy link
Contributor

jreback commented Apr 26, 2017

i will expose use_numexpr as an option

@chris-b1
Copy link
Contributor

@jreback - one pseudo-public thing from pandas.types was union_categoricals, in docs here.

@jreback
Copy link
Contributor

jreback commented Apr 27, 2017

@jorisvandenbossche
Copy link
Member

@jreback yes, but I think what @chris-b1 wants to say that is then could also use a deprecation warning (since it was documented using the old path, so it was kind of public)

@jreback
Copy link
Contributor

jreback commented Apr 27, 2017

@jreback yes, but I think what @chris-b1 wants to say that is then could also use a deprecation warning (since it was documented using the old path, so it was kind of public)

ok, guess could do that.

jreback added a commit to jreback/pandas that referenced this pull request Apr 27, 2017
@jreback jreback closed this in 075eca1 Apr 27, 2017
@dhirschfeld dhirschfeld deleted the compat branch April 27, 2017 22:43
pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
* DEPR: allow options for using bottleneck/numexpr

deprecate pd.computation.expressions.set_use_numexpr()

* DEPR: pandas.types.concat.union_categoricals in favor of pandas.api.type.union_categoricals

closes pandas-dev#16140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants