Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Aug 26, 2016

What changes were proposed in this pull request?

This PR enables the tests for TimestampType for JSON and unifies the logics for verifying schema when writing in CSV.

In more details, this PR,

  • Enables the tests for TimestampType for JSON and

    This was disabled due to an issue in DatatypeConverter.parseDateTime which parses dates incorrectly, for example as below:

    val d = javax.xml.bind.DatatypeConverter.parseDateTime("0900-01-01T00:00:00.000").getTime
    println(d.toString)
    Fri Dec 28 00:00:00 KST 899
    

    However, since we use FastDateFormat, it seems we are safe now.

    val d = FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSS").parse("0900-01-01T00:00:00.000")
    println(d)
    Tue Jan 01 00:00:00 PST 900
    
  • Verifies all unsupported types in CSV

    There is a separate logics to verify the schemas in CSVFileFormat. This is actually not quite correct enough because we don't support NullType and CalanderIntervalType as well StructType, ArrayType, MapType. So, this PR adds both types.

How was this patch tested?

Tests in JsonHadoopFsRelation and CSVSuite

@HyukjinKwon HyukjinKwon changed the title [SPARK-16216][SQL][FOLLOWUP] Enable JSON types for timestamp and clean up writing logics [SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and clean up writing logics Aug 26, 2016
@HyukjinKwon HyukjinKwon changed the title [SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and clean up writing logics [SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and clean up writing logics in CSV Aug 26, 2016
(row: InternalRow, ordinal: Int) => row.get(ordinal, dataType).toString

case FloatType | DoubleType | _: DecimalType | BooleanType | StringType =>
(row: InternalRow, ordinal: Int) => row.get(ordinal, dataType).toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference with the case and the case above? The result is the same right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, same. I just splited this. I can unify them.

@HyukjinKwon
Copy link
Member Author

Thank you for your review @hvanhovell I will quickly address your comments.

@HyukjinKwon HyukjinKwon changed the title [SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and clean up writing logics in CSV [SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and verify all unsupported types in CSV Aug 26, 2016
// `TimestampType` is disabled because `DatatypeConverter.parseDateTime()`
// in `DateTimeUtils` parses the formatted string wrongly when the date is
// too early. (e.g. "1600-07-13T08:36:32.847").
case _: TimestampType => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been fixed?

Copy link
Member Author

@HyukjinKwon HyukjinKwon Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have enabled this in the previous PR. My initial proposal was just to change the write path where both CSV and JSON used DatatypeConverter.parseDateTime for read path.

However, the previous PR ended up with changing read path as well switching DatatypeConverter.parseDateTime to FastDateFormat not having this problem as described above.

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64468 has finished for PR 14829 at commit 0f83127.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64472 has finished for PR 14829 at commit 831268d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

(@hvanhovell I don't mind to leave only enabling the test except for other changes if you feel not too sure of the other changes as it seems it ended up with a different issue anyway.)

@hvanhovell
Copy link
Contributor

@HyukjinKwon this is fine. It is a closely related issue.

LGTM - pending jenkins.

@HyukjinKwon
Copy link
Member Author

Thank you very much for your close look.

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64479 has finished for PR 14829 at commit 948b456.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 6063d59 Aug 26, 2016
@hvanhovell
Copy link
Contributor

@HyukjinKwon should this be backported? If it does, could you reopen it for 2.0?

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64482 has finished for PR 14829 at commit 1baacf2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Yes, I will. Thanks

asfgit pushed a commit that referenced this pull request Aug 28, 2016
…type tests for JSON and verify all unsupported types in CSV

## What changes were proposed in this pull request?

This backports #14829

## How was this patch tested?

Tests in `JsonHadoopFsRelation` and `CSVSuite`.

Author: hyukjinkwon <[email protected]>

Closes #14840 from HyukjinKwon/SPARK-16216-followup-backport.
@HyukjinKwon HyukjinKwon deleted the SPARK-16216-followup branch January 2, 2018 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants