[SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection #8556

0x0FFF · 2015-09-01T13:37:10Z

This PR addresses issue SPARK-10392
The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the datetime.date due to implementation of its return statement

Issue reproduction on master:

>>> from pyspark.sql.types import *
>>> a = DateType()
>>> a.fromInternal(0)
0
>>> a.fromInternal(1)
datetime.date(1970, 1, 2)

JoshRosen · 2015-09-01T17:07:03Z

Jenkins, this is ok to test.

SparkQA · 2015-09-01T17:35:47Z

Test build #41875 has finished for PR 8556 at commit 7985de7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-09-01T18:17:46Z

@davies

davies · 2015-09-01T18:49:14Z

python/pyspark/sql/types.py

I think this change is not needed, d will always be true if it's not None

It will be false for None, empty string, empty list, 0. This way it will be returned instead of the date:

>>> [] and 'foo' [] >>> 0 and 'foo' 0 >>> 1 and 'foo' 'foo'

d could not be list/int/string, it can only be Date or None.

Sorry, I see. Should I revert this change? It is mostly for both methods to follow the same logic, as return x and y is not a very readable code in my opinion. Also the class TimestampType is implemented in a similar way - both methods are starting with input parameter check for None

Either is good to me.

davies · 2015-09-01T18:49:49Z

@0x0FFF Thanks for working on this, could you add a regression test for it?

0x0FFF · 2015-09-01T19:38:42Z

Added regression test

SparkQA · 2015-09-01T20:07:14Z

Test build #41886 has finished for PR 8556 at commit f41b125.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaTrainValidationSplitExample
- class DCT(JavaTransformer, HasInputCol, HasOutputCol):
- class SQLTransformer(JavaTransformer):
- case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
- case class UnionNode(children: Seq[LocalNode]) extends LocalNode

davies · 2015-09-01T20:19:27Z

python/pyspark/sql/tests.py

This test case don't need sqlCtx, it's better to be inside DataTypeTests.

0x0FFF · 2015-09-01T20:31:14Z

Moved regression test to DataTypeTests class

davies · 2015-09-01T20:41:47Z

LGTM, will merge once it pass tests.

SparkQA · 2015-09-01T20:54:27Z

Test build #41889 has finished for PR 8556 at commit a7fd681.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

0x0FFF · 2015-09-01T21:08:36Z

Failed mllib test for python2.6, I didn't change anything that might have affected it. Same test passes locally on my machine

0x0FFF · 2015-09-01T21:08:42Z

Jenkins, retest this please.

davies · 2015-09-01T21:13:32Z

@0x0FFF The JVM died during tests.

SparkQA · 2015-09-01T21:45:02Z

Test build #1708 has finished for PR 8556 at commit a7fd681.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class DCT(JavaTransformer, HasInputCol, HasOutputCol):
- class SQLTransformer(JavaTransformer):

davies · 2015-09-01T21:59:13Z

merged into master and 1.5 branch, thanks!

This PR addresses issue [SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392) The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the `datetime.date` due to implementation of its return statement Issue reproduction on master: ``` >>> from pyspark.sql.types import * >>> a = DateType() >>> a.fromInternal(0) 0 >>> a.fromInternal(1) datetime.date(1970, 1, 2) ``` Author: 0x0FFF <[email protected]> Closes #8556 from 0x0FFF/SPARK-10392.

SparkQA · 2015-09-01T23:46:21Z

Test build #1709 has finished for PR 8556 at commit a7fd681.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class DCT(JavaTransformer, HasInputCol, HasOutputCol):
- class SQLTransformer(JavaTransformer):

[SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection

7985de7

davies reviewed Sep 1, 2015
View reviewed changes

[SPARK-10392] [SQL] Regression test added

f41b125

davies reviewed Sep 1, 2015
View reviewed changes

python/pyspark/sql/tests.py Outdated

Copy link

Contributor

davies Sep 1, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case don't need sqlCtx, it's better to be inside DataTypeTests.

[SPARK-10392] [SQL] Moving regression test to DataTypeTests class

a7fd681

asfgit closed this in 00d9af5 Sep 1, 2015

[SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection #8556

[SPARK-10392] [SQL] Pyspark - Wrong DateType support on JDBC connection #8556

Uh oh!

Conversation

0x0FFF commented Sep 1, 2015

Uh oh!

JoshRosen commented Sep 1, 2015

Uh oh!

SparkQA commented Sep 1, 2015

Uh oh!

andrewor14 commented Sep 1, 2015

Uh oh!

davies Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

0x0FFF Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

davies Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

0x0FFF Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

davies Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

davies commented Sep 1, 2015

Uh oh!

0x0FFF commented Sep 1, 2015

Uh oh!

SparkQA commented Sep 1, 2015

Uh oh!

davies Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

0x0FFF commented Sep 1, 2015

Uh oh!

davies commented Sep 1, 2015

Uh oh!

SparkQA commented Sep 1, 2015

Uh oh!

0x0FFF commented Sep 1, 2015

Uh oh!

0x0FFF commented Sep 1, 2015

Uh oh!

davies commented Sep 1, 2015

Uh oh!

SparkQA commented Sep 1, 2015

Uh oh!

davies commented Sep 1, 2015

Uh oh!

SparkQA commented Sep 1, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants