Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

Conversation

@jameswinegar
Copy link

Fix issues discussed by @HyukjinKwon in #293.
Tests for using a default timestamp of yyyy-MM-dd'T'HH:mm:ss.SSSXXX aren't consistent with timezone awareness in java.sql.timestamp

…ault timestamp of yyyy-MM-dd'T'HH:mm:ss.SSSXXX aren't consistent with timezone awareness in java.sql.timestamp
@codecov-io
Copy link

codecov-io commented May 31, 2018

Codecov Report

Merging #308 into master will decrease coverage by <.01%.
The diff coverage is 95.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #308      +/-   ##
==========================================
- Coverage   88.54%   88.53%   -0.01%     
==========================================
  Files          14       14              
  Lines         733      750      +17     
  Branches       93       96       +3     
==========================================
+ Hits          649      664      +15     
- Misses         84       86       +2
Impacted Files Coverage Δ
...la/com/databricks/spark/xml/util/InferSchema.scala 87.32% <100%> (+0.18%) ⬆️
...atabricks/spark/xml/parsers/StaxXmlGenerator.scala 95.55% <100%> (ø) ⬆️
...in/scala/com/databricks/spark/xml/XmlOptions.scala 97.5% <100%> (+0.44%) ⬆️
...scala/com/databricks/spark/xml/util/TypeCast.scala 81.31% <91.66%> (-0.39%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3772b2...629d3db. Read the comment docs.

@jameswinegar
Copy link
Author

jameswinegar commented May 31, 2018

Need to address prior commits to master that used the same files as this PR.


assert(df.collect().length == numFiasHouses)
assert(df.select().where("_HOUSEID is null").count() == 0)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add back

rootValuesMap.foreach {
case (f, v) =>
nameToDataType += (f -> ArrayBuffer(inferFrom(v, options)))
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add back

case (f, v) =>
nameToDataType += (f -> ArrayBuffer(inferFrom(v, options)))
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

import java.util.Locale

import org.scalatest.FunSuite

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add back for style

val numBooksComplicated = 3
val numTopics = 1
val numGPS = 2
val numFiasHouses = 37
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

val booksRootTag = "books"
val topicsTag = "Topic"
val agesTag = "person"
val fiasRowTag = "House"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

val simpleNestedObjects = "src/test/resources/simple-nested-objects.xml"
val nestedElementWithNameOfParent = "src/test/resources/nested-element-with-name-of-parent.xml"
val booksMalformedAttributes = "src/test/resources/books-malformed-attributes.xml"
val fiasHouse = "src/test/resources/fias_house.xml"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

// We need to manually merges the fields having the sames so that
// This can be inferred as ArrayType.
nameToDataType.foreach {
nameToDataType.foreach{
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@jameswinegar jameswinegar reopened this May 31, 2018
@jameswinegar
Copy link
Author

@HyukjinKwon ready for review I think.

@HyukjinKwon
Copy link
Member

Hey @jameswinegar, thanks for addressing this one. Will take a look, merge and make a release soon. Thanks again!

@jameswinegar
Copy link
Author

@HyukjinKwon just wanted to follow up.

@HyukjinKwon
Copy link
Member

Argh, this weekend ..

@jameswinegar
Copy link
Author

Following up again

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameswinegar, would you mind if I ask to remove dateFormat support in this PR? I think in this way we can avoid the type conflict problem in schema inference, and then just go merging it and unblock the release.

Otherwise LGTM if we only go for timestampFormat in this PR for now.

val DEFAULT_ROOT_TAG = "ROWS"
val DEFAULT_CHARSET = "UTF-8"
val DEFAULT_NULL_VALUE = null
val DEFAULT_TIMESTAMP_FORMAT = "yyyy-MM-dd HH:mm:ss.SSS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameswinegar, shall we match this to CSV's in Spark? yyyy-MM-dd'T'HH:mm:ss.SSSXXX. IIRC, that complies ISO 8601 and should be good to be consistent with it.

}
}

private[xml] def isDate(value: String, dateFormatter: SimpleDateFormat = null) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameswinegar, seems there looks no case where dateFormatter should be omitted. I think we can just isDate(value: String, dateFormatter: SimpleDateFormat)

FloatType,
DoubleType,
TimestampType,
DateType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. this actually needs a more fundamental fix. We should not state the precedence in this order (date is wider then timestamp). I am pretty sure this could lead to ending up with date during resolving the type conflicts between date and timestamp.

For example,

"1999/01/11" -> date
"2017/05/15T14:58:19Z" -> timestamp

then it becomes date time which I guess will truncate the time part in the timestamp.

@HyukjinKwon
Copy link
Member

#316 actually a release blocker too. Let me revert this and make a release if no fix looks going to be landed soon.

@jameswinegar
Copy link
Author

jameswinegar commented Sep 15, 2018

@HyukjinKwon can we put eyes on this again? see #321

@BioQwer
Copy link
Contributor

BioQwer commented Nov 30, 2018

@jameswinegar #321 is ready

@HyukjinKwon
Copy link
Member

That's fixed now. Can you update this PR @jameswinegar?

@jameswinegar
Copy link
Author

jameswinegar commented Dec 22, 2018 via email

@srowen
Copy link
Collaborator

srowen commented Oct 28, 2019

You are welcome to reopen with new commits

@srowen srowen closed this Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants