Skip to content

Conversation

OlivierLuG
Copy link
Contributor

@OlivierLuG OlivierLuG commented Jun 11, 2020

Closes #34621
Closes #17053

  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

The class Period do not support nanosecond. I've made a quick and dirty code to support nanoseconds. I have struggled to find an alternative to dateutil.parser, but I didn't found an alternative.

The PR may be a performance issue as I've add a Timestamp constructor to the dateutil.parser. There is for sure a better solution. Please let me know !

@mroeschke
Copy link
Member

mroeschke commented Jun 11, 2020

See #34621 (comment). I don't think this is a bug.

Edit: Misunderstood, confirmed a bug

@pep8speaks
Copy link

pep8speaks commented Jun 12, 2020

Hello @OlivierLuG! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-16 18:43:06 UTC

nanosecond = ts.nanosecond
if nanosecond != 0:
reso = 'nanosecond'
except (ValueError, OutOfBoundsDatetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOBDatetime is a subclass of ValueError so this is uneeded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've first try to only use OutOfBoundsDatetime but the test code fails on many VM. Here is an example: https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=37155&view=logs&j=a3a13ea8-7cf0-5bdb-71bb-6ac8830ae35c&t=add65f64-6c25-5783-8fd6-d9aa1b63d9d4&l=471

pandas/_libs/tslibs/period.pyx:2410: in pandas._libs.tslibs.period.Period.__new__
    ts = Timestamp(value, freq=freq)
pandas/_libs/tslibs/timestamps.pyx:848: in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
pandas/_libs/tslibs/conversion.pyx:333: in pandas._libs.tslibs.conversion.convert_to_tsobject
    return convert_str_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise ValueError("could not convert string to Timestamp")
E   ValueError: could not convert string to Timestamp

pandas/_libs/tslibs/conversion.pyx:581: ValueError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierLuG I think Jeff meant that it should be enough to only use ValueError, as that would already cover the OutOfBoundsDatetime case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it. I will try a new PR with only ValueError

value = value.upper()
dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value, freq=freq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pass freq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new PR is ready (waiting for the question about else)

@jreback jreback added the Period Period data type label Jun 14, 2020
@OlivierLuG
Copy link
Contributor Author

Closes #34621
Closes #17053

@OlivierLuG OlivierLuG requested a review from jreback June 29, 2020 19:42
with pytest.raises(ValueError, match=msg):
Period("2011-01", freq="1D1W")

@pytest.mark.parametrize("day_", ["1970/01/01 ", "2020-12-31 ", "1981/09/13 "])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think the trailing underscores are necessary for these names

dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we end up parsing this twice which is not good, yeah i think this needs to be integrated a bit more to parse_time_string.

i guess the currently soln would be ok in the interim, but would need to run asv's on all periods to see what kind of perf hit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I read, nanosecond was not accepted into dateutil module. This interim solution could stand for a while.
I stay tuned...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I read, nanosecond was not accepted into dateutil module. This interim solution could stand for a while.
I stay tuned...

we don't wait for dateutil in this, as it will not likey every support nanosecond because the standard library does not

Copy link
Contributor

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierLuG is this still active? If yes can you merge master & resolve conflicts and address comments

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run the period asv's to see if any perf issues

dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I read, nanosecond was not accepted into dateutil module. This interim solution could stand for a while.
I stay tuned...

we don't wait for dateutil in this, as it will not likey every support nanosecond because the standard library does not

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

this looked ok, pls merge master and address comments, ping on green

@arw2019
Copy link
Contributor

arw2019 commented Nov 30, 2020

Closing in favor of #38175

@arw2019 arw2019 closed this Nov 30, 2020
@OlivierLuG OlivierLuG deleted the BUG_#34621 branch January 2, 2021 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Period Period data type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Period constructor does not take into account nanonseconds BUG: Period with nanoseconds (?)

7 participants