Skip to content

Conversation

@0x0FFF
Copy link
Contributor

@0x0FFF 0x0FFF commented Sep 1, 2015

This PR addresses issue SPARK-10392
The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the datetime.date due to implementation of its return statement

Issue reproduction on master:

>>> from pyspark.sql.types import *
>>> a = DateType()
>>> a.fromInternal(0)
0
>>> a.fromInternal(1)
datetime.date(1970, 1, 2)

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Sep 1, 2015

Test build #41875 has finished for PR 8556 at commit 7985de7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is not needed, d will always be true if it's not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be false for None, empty string, empty list, 0. This way it will be returned instead of the date:

>>> [] and 'foo'
[]
>>> 0 and 'foo'
0
>>> 1 and 'foo'
'foo'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d could not be list/int/string, it can only be Date or None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I see. Should I revert this change? It is mostly for both methods to follow the same logic, as return x and y is not a very readable code in my opinion. Also the class TimestampType is implemented in a similar way - both methods are starting with input parameter check for None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is good to me.

@davies
Copy link
Contributor

davies commented Sep 1, 2015

@0x0FFF Thanks for working on this, could you add a regression test for it?

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 1, 2015

Added regression test

@SparkQA
Copy link

SparkQA commented Sep 1, 2015

Test build #41886 has finished for PR 8556 at commit f41b125.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):
    • case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode
    • case class UnionNode(children: Seq[LocalNode]) extends LocalNode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case don't need sqlCtx, it's better to be inside DataTypeTests.

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 1, 2015

Moved regression test to DataTypeTests class

@davies
Copy link
Contributor

davies commented Sep 1, 2015

LGTM, will merge once it pass tests.

@SparkQA
Copy link

SparkQA commented Sep 1, 2015

Test build #41889 has finished for PR 8556 at commit a7fd681.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 1, 2015

Failed mllib test for python2.6, I didn't change anything that might have affected it. Same test passes locally on my machine

@0x0FFF
Copy link
Contributor Author

0x0FFF commented Sep 1, 2015

Jenkins, retest this please.

@davies
Copy link
Contributor

davies commented Sep 1, 2015

@0x0FFF The JVM died during tests.

@SparkQA
Copy link

SparkQA commented Sep 1, 2015

Test build #1708 has finished for PR 8556 at commit a7fd681.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):

@davies
Copy link
Contributor

davies commented Sep 1, 2015

merged into master and 1.5 branch, thanks!

@asfgit asfgit closed this in 00d9af5 Sep 1, 2015
asfgit pushed a commit that referenced this pull request Sep 1, 2015
This PR addresses issue [SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392)
The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the `datetime.date` due to implementation of its return statement

Issue reproduction on master:
```
>>> from pyspark.sql.types import *
>>> a = DateType()
>>> a.fromInternal(0)
0
>>> a.fromInternal(1)
datetime.date(1970, 1, 2)
```

Author: 0x0FFF <[email protected]>

Closes #8556 from 0x0FFF/SPARK-10392.
@SparkQA
Copy link

SparkQA commented Sep 1, 2015

Test build #1709 has finished for PR 8556 at commit a7fd681.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants