-
Notifications
You must be signed in to change notification settings - Fork 227
Add timestampFormat and dateFormat parsing options #308
Conversation
…ault timestamp of yyyy-MM-dd'T'HH:mm:ss.SSSXXX aren't consistent with timezone awareness in java.sql.timestamp
Codecov Report
@@ Coverage Diff @@
## master #308 +/- ##
==========================================
- Coverage 88.54% 88.53% -0.01%
==========================================
Files 14 14
Lines 733 750 +17
Branches 93 96 +3
==========================================
+ Hits 649 664 +15
- Misses 84 86 +2
Continue to review full report at Codecov.
|
|
Need to address prior commits to master that used the same files as this PR. |
|
|
||
| assert(df.collect().length == numFiasHouses) | ||
| assert(df.select().where("_HOUSEID is null").count() == 0) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add back
| rootValuesMap.foreach { | ||
| case (f, v) => | ||
| nameToDataType += (f -> ArrayBuffer(inferFrom(v, options))) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add back
| case (f, v) => | ||
| nameToDataType += (f -> ArrayBuffer(inferFrom(v, options))) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
| import java.util.Locale | ||
|
|
||
| import org.scalatest.FunSuite | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add back for style
| val numBooksComplicated = 3 | ||
| val numTopics = 1 | ||
| val numGPS = 2 | ||
| val numFiasHouses = 37 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
| val booksRootTag = "books" | ||
| val topicsTag = "Topic" | ||
| val agesTag = "person" | ||
| val fiasRowTag = "House" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
| val simpleNestedObjects = "src/test/resources/simple-nested-objects.xml" | ||
| val nestedElementWithNameOfParent = "src/test/resources/nested-element-with-name-of-parent.xml" | ||
| val booksMalformedAttributes = "src/test/resources/books-malformed-attributes.xml" | ||
| val fiasHouse = "src/test/resources/fias_house.xml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
| // We need to manually merges the fields having the sames so that | ||
| // This can be inferred as ArrayType. | ||
| nameToDataType.foreach { | ||
| nameToDataType.foreach{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix
|
@HyukjinKwon ready for review I think. |
|
Hey @jameswinegar, thanks for addressing this one. Will take a look, merge and make a release soon. Thanks again! |
|
@HyukjinKwon just wanted to follow up. |
|
Argh, this weekend .. |
|
Following up again |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameswinegar, would you mind if I ask to remove dateFormat support in this PR? I think in this way we can avoid the type conflict problem in schema inference, and then just go merging it and unblock the release.
Otherwise LGTM if we only go for timestampFormat in this PR for now.
| val DEFAULT_ROOT_TAG = "ROWS" | ||
| val DEFAULT_CHARSET = "UTF-8" | ||
| val DEFAULT_NULL_VALUE = null | ||
| val DEFAULT_TIMESTAMP_FORMAT = "yyyy-MM-dd HH:mm:ss.SSS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameswinegar, shall we match this to CSV's in Spark? yyyy-MM-dd'T'HH:mm:ss.SSSXXX. IIRC, that complies ISO 8601 and should be good to be consistent with it.
| } | ||
| } | ||
|
|
||
| private[xml] def isDate(value: String, dateFormatter: SimpleDateFormat = null) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameswinegar, seems there looks no case where dateFormatter should be omitted. I think we can just isDate(value: String, dateFormatter: SimpleDateFormat)
| FloatType, | ||
| DoubleType, | ||
| TimestampType, | ||
| DateType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm .. this actually needs a more fundamental fix. We should not state the precedence in this order (date is wider then timestamp). I am pretty sure this could lead to ending up with date during resolving the type conflicts between date and timestamp.
For example,
"1999/01/11" -> date
"2017/05/15T14:58:19Z" -> timestamp
then it becomes date time which I guess will truncate the time part in the timestamp.
|
#316 actually a release blocker too. Let me revert this and make a release if no fix looks going to be landed soon. |
|
@HyukjinKwon can we put eyes on this again? see #321 |
|
@jameswinegar #321 is ready |
|
That's fixed now. Can you update this PR @jameswinegar? |
|
I'll try and get to it sometime this week.
…On Sat, Dec 22, 2018, 4:29 AM Hyukjin Kwon ***@***.*** wrote:
That's fixed now. Can you update this PR @jameswinegar
<https://github.com/jameswinegar>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#308 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIC0yX8tJZqFaX8MXoO7Vi7qzPG8sr2Kks5u7gmSgaJpZM4UUZqy>
.
|
|
You are welcome to reopen with new commits |
Fix issues discussed by @HyukjinKwon in #293.
Tests for using a default timestamp of yyyy-MM-dd'T'HH:mm:ss.SSSXXX aren't consistent with timezone awareness in java.sql.timestamp