Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

Closes #8614

  • this adds a warning that you now have to use to_pydatetime as direct plotting with a DatetimeIndex does not work anymore (see Plotting of DatetimeIndex directly with matplotlib no longer gives datetime formatted axis (0.15) #8614)
  • I also removed for now the note on speed and explanation of the registered formatters. @agijsberts could you shed some light on this?
    • "The speed up for large data sets only applies to pandas 0.14.0 and later." Why only for pandas 0.14 or later? And from where does this speed-up come from?
    • "thereby extending date and time support to practically all plot types available in matplotlib" -> but if you plot directly with matplotlib, I think you don't use the pandas registered formatters? So that sentence seems not fully correct, is that possible? And isn't that the reason for the possible speed-up (matplotlib defaults formatter being faster as pandas' formatter)?

@agijsberts
Copy link
Contributor

@jorisvandenbossche I wrote that documentation to reflect the changes in PR #6650 that I prepared for issue #6636. In short, pandas does register its own formatters (see https://github.com/pydata/pandas/blob/master/pandas/tseries/converter.py#L27), so this behavior is entirely part of pandas. The PR obtained a speed-up by replacing a call to matplotlib's date2num (explicit for-loop) with epoch2num (vectorized), hence the statement that it's much faster since pandas 0.14.0.

It seems that the formatters registered by pandas are still correctly called:

In [1]: from pandas import DataFrame, date_range

In [2]: import matplotlib.units as units

In [3]: df = DataFrame(range(100), index = date_range('20130101', periods=100))

In [4]: units.registry.get_converter(df.index)
Out[4]: <pandas.tseries.converter.DatetimeConverter instance at 0x3f33f38>

So far I do not see any obvious cause for the new problems, but I'll dive deeper into this later on.

@jorisvandenbossche
Copy link
Member Author

Ah, yes, I was confusing the 'converter' (which are registered in the matplotlib units) and the formatting of the labels (which is also nicer in pandas, but this is something you only have when plotting with pandas' plot and not when plotting directly with matplotlibs plot, while the unit converter works for both)

But, two points:

  • your statement about the improved performance also holds true when plotting with pandas, so it is not specific to plotting directly with matplotlib, so it is seems a bit out of place in this section?

  • The units.registry.get_converter(df.index) does indeed still work. But, the problem is, that when plotting with plt.plot(df.index, df['col']), the df.index is first converted to a array of datetime64, and this is not recognized by matplotlib anymore:

    In [35]: np.asarray(df.index)
    Out[35]:
    array(['2013-01-01T01:00:00.000000000+0100',
           ...
           '2013-04-10T02:00:00.000000000+0200'], dtype='datetime64[ns]')
    
    In [36]: units.registry.get_converter(np.asarray(df.index))
    
    -> None
    

    To make it fully complex, the plt.fill_between in the example does not do this (and does still convert it to datetimes), and because of that the example now crashed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extending date and time support to practically all plot types available in matplotlib" -> @agijsberts but matplotlib by default has also a datetime converter?
https://github.com/matplotlib/matplotlib/blob/v1.4.2/lib/matplotlib/dates.py#L1380 We just overwrite it with ours?

@agijsberts
Copy link
Contributor

@jorisvandenbossche Re. your points:

  • df.plot() did not benefit from the PR as it uses (at least back then) a different converter and formatter. See Speed up DatetimeConverter for plotting #6636 (comment) for details.
  • You're right, plt.plot ends up looking for a datetime64 formatter, which does not exist. The easiest workaround to make things work as they were is to register pandas' DateTimeFormatter for datetime64:
from numpy import datetime64
import pandas as pd
import matplotlib.units as units
import matplotlib.pyplot as plt
df = pd.DataFrame(range(100), index = pd.date_range('20130101', periods=100))
units.registry[datetime64] = pd.tseries.converter.DatetimeConverter()
plt.plot(df.index, df)
plt.show()
  • The DateTimeConverter in matplotlib is limited to datetime.datetime and datetime.date. Of course you could use it if you first convert the index with to_pydatetime(), but it is wasteful to convert datetime64 to datetime and then again to float. An example with a DateTimeIndex of length 100000:
In [22]: %timeit DatetimeConverter.convert(df.index, None, None)
10000 loops, best of 3: 74.4 us per loop

In [23]: %timeit DateConverter.convert(df.index.to_pydatetime(), None, None)
100 loops, best of 3: 9.35 ms per loop
  • When not importing pandas, then I believe there two options to plot datetime64 as a time-axis:
    1. convert to datetime.datetime and use matplotlib's DateConverter (slow, see above)
    2. manually convert datetime64 to matplotlib's time representation with epoch2num(dt.asi8 / 1.0E9). Of course then you are still responsible for installing the date/time formatters. This conversion is however very fast and exactly what the PR implemented.

@jorisvandenbossche
Copy link
Member Author

So our DatetimeConverter already works for datetime64? So we should just register it, and the initial problem is solved! (#8614)

Simply doing units.registry[np.datetime64] = DatetimeConverter() solves the issue

@agijsberts Thanks a lot for shedding your light on this!

@agijsberts
Copy link
Contributor

@jorisvandenbossche I'm glad to help. And yes, DatetimeConverter should work for datetime64; it actually exploits the fact that DatetimeIndex is stored as datetime64[ns].

@jorisvandenbossche
Copy link
Member Author

OK, I will then close this PR as it is totally the wrong way :-)
and open an new one to register the converter for datetime64. Or would there be people who rely on the fact that datetime64 arrays are regarded as ints in matplotlib? Let's discuss further in #8655.

@jreback jreback modified the milestones: 0.15.2, 0.15.1 Oct 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plotting of DatetimeIndex directly with matplotlib no longer gives datetime formatted axis (0.15)

3 participants