-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection #8556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins, this is ok to test. |
|
Test build #41875 has finished for PR 8556 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is not needed, d will always be true if it's not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be false for None, empty string, empty list, 0. This way it will be returned instead of the date:
>>> [] and 'foo'
[]
>>> 0 and 'foo'
0
>>> 1 and 'foo'
'foo'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d could not be list/int/string, it can only be Date or None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I see. Should I revert this change? It is mostly for both methods to follow the same logic, as return x and y is not a very readable code in my opinion. Also the class TimestampType is implemented in a similar way - both methods are starting with input parameter check for None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either is good to me.
|
@0x0FFF Thanks for working on this, could you add a regression test for it? |
|
Added regression test |
|
Test build #41886 has finished for PR 8556 at commit
|
python/pyspark/sql/tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case don't need sqlCtx, it's better to be inside DataTypeTests.
|
Moved regression test to DataTypeTests class |
|
LGTM, will merge once it pass tests. |
|
Test build #41889 has finished for PR 8556 at commit
|
|
Failed mllib test for python2.6, I didn't change anything that might have affected it. Same test passes locally on my machine |
|
Jenkins, retest this please. |
|
@0x0FFF The JVM died during tests. |
|
Test build #1708 has finished for PR 8556 at commit
|
|
merged into master and 1.5 branch, thanks! |
This PR addresses issue [SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392) The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the `datetime.date` due to implementation of its return statement Issue reproduction on master: ``` >>> from pyspark.sql.types import * >>> a = DateType() >>> a.fromInternal(0) 0 >>> a.fromInternal(1) datetime.date(1970, 1, 2) ``` Author: 0x0FFF <[email protected]> Closes #8556 from 0x0FFF/SPARK-10392.
|
Test build #1709 has finished for PR 8556 at commit
|
This PR addresses issue SPARK-10392
The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the
datetime.datedue to implementation of its return statementIssue reproduction on master: