-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and verify all unsupported types in CSV #14829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| (row: InternalRow, ordinal: Int) => row.get(ordinal, dataType).toString | ||
|
|
||
| case FloatType | DoubleType | _: DecimalType | BooleanType | StringType => | ||
| (row: InternalRow, ordinal: Int) => row.get(ordinal, dataType).toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference with the case and the case above? The result is the same right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, same. I just splited this. I can unify them.
|
Thank you for your review @hvanhovell I will quickly address your comments. |
| // `TimestampType` is disabled because `DatatypeConverter.parseDateTime()` | ||
| // in `DateTimeUtils` parses the formatted string wrongly when the date is | ||
| // too early. (e.g. "1600-07-13T08:36:32.847"). | ||
| case _: TimestampType => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have enabled this in the previous PR. My initial proposal was just to change the write path where both CSV and JSON used DatatypeConverter.parseDateTime for read path.
However, the previous PR ended up with changing read path as well switching DatatypeConverter.parseDateTime to FastDateFormat not having this problem as described above.
|
Test build #64468 has finished for PR 14829 at commit
|
|
Test build #64472 has finished for PR 14829 at commit
|
|
(@hvanhovell I don't mind to leave only enabling the test except for other changes if you feel not too sure of the other changes as it seems it ended up with a different issue anyway.) |
|
@HyukjinKwon this is fine. It is a closely related issue. LGTM - pending jenkins. |
|
Thank you very much for your close look. |
|
Test build #64479 has finished for PR 14829 at commit
|
|
LGTM - merging to master. Thanks! |
|
@HyukjinKwon should this be backported? If it does, could you reopen it for 2.0? |
|
Test build #64482 has finished for PR 14829 at commit
|
|
Yes, I will. Thanks |
…type tests for JSON and verify all unsupported types in CSV ## What changes were proposed in this pull request? This backports #14829 ## How was this patch tested? Tests in `JsonHadoopFsRelation` and `CSVSuite`. Author: hyukjinkwon <[email protected]> Closes #14840 from HyukjinKwon/SPARK-16216-followup-backport.
What changes were proposed in this pull request?
This PR enables the tests for
TimestampTypefor JSON and unifies the logics for verifying schema when writing in CSV.In more details, this PR,
Enables the tests for
TimestampTypefor JSON andThis was disabled due to an issue in
DatatypeConverter.parseDateTimewhich parses dates incorrectly, for example as below:However, since we use
FastDateFormat, it seems we are safe now.Verifies all unsupported types in CSV
There is a separate logics to verify the schemas in
CSVFileFormat. This is actually not quite correct enough because we don't supportNullTypeandCalanderIntervalTypeas wellStructType,ArrayType,MapType. So, this PR adds both types.How was this patch tested?
Tests in
JsonHadoopFsRelationandCSVSuite