-
Notifications
You must be signed in to change notification settings - Fork 227
Add support for values with self-closing tags #285
Conversation
add test case pump version Signed-off-by: Anton Alexandrov <[email protected]>
Signed-off-by: Anton Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #285 +/- ##
=========================================
- Coverage 88.52% 88.4% -0.13%
=========================================
Files 14 14
Lines 732 733 +1
Branches 101 98 -3
=========================================
Hits 648 648
- Misses 84 85 +1
Continue to review full report at Codecov.
|
|
Hello @HyukjinKwon |
|
@BioQwer, thanks for this fix. Seems roughly fine. Will try to take a closer look soon. |
|
@HyukjinKwon really looking forward, because it covers a lot of problems. |
|
Hi, The last release of databricks spark-XML version 0.4.1 was on Nov 6, 2016. Can someone update the next release details of databricks spark-xml? Thanks in Advance |
Signed-off-by: Antony Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
|
@HyukjinKwon hello, what about merge my changes? |
|
|
||
| require(rowTag.nonEmpty, "'rowTag' option should not be empty string.") | ||
| require(attributePrefix.nonEmpty, "'attributePrefix' option should not be empty string.") | ||
| logger.warn("'attributePrefix' option should not be empty string.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs to be conditional, else it will always fire the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RJKeevil one xml = one warning
For my xml i need empty attributePrefix, because all my values in attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My task it's use fias (All addresses of Russia) xml in spark.
I think many Russains will collide, in this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but the way the code is written it will fire the empty attribute prefix warning, whether it is empty or not? I have a prefix of _ defined and I get that warning in the logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oy, i understand, i will fix it.
Signed-off-by: Antony Alexandrov <[email protected]>
|
When is the new version of spark-xml with this support going to be released? |
|
@jaybooth4 Hello |
Hadoop added a woodstox dependency in https://issues.apache.org/jira/browse/HADOOP-14501. When using versions of Hadoop with that dependency, WstxOutputFactory will get loaded for XMLOutputFactory.newInstance. The resulting streamwriter will ensure that a client does not write two root elements, which causes the current code to error out. This PR makes it such that the writer function will start the root tag before writing rows. Author: Patrick Woody <[email protected]> Closes #282 from pwoody/pw/multipleRootFix.
| val permissive = ParseModes.isPermissiveMode(parseMode) | ||
|
|
||
| require(rowTag.nonEmpty, "'rowTag' option should not be empty string.") | ||
| require(attributePrefix.nonEmpty, "'attributePrefix' option should not be empty string.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BioQwer, BTW, mind if I ask why we warn here instead of the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon For my xml i need empty attributePrefix, because all my values in attributes.
My task it's use fias (All addresses of Russia) xml in spark.
I think many Russains will collide, in this problem.
| options: XmlOptions, | ||
| rootAttributes: Array[Attribute] = Array.empty): Row = { | ||
| val row = new Array[Any](schema.length) | ||
| val nameToIndex = schema.map(_.name).zipWithIndex.toMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind if I ask to elaborate this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon if we hava attributes, we need them firstly before reading values.
before we have if we hava attributes, and don't have values -> don't read them. I fix it
| Some(inferObject(parser, options, rootAttributes)) | ||
| } catch { | ||
| case NonFatal(_) if shouldHandleCorruptRecord => | ||
| case NonFatal(x) if shouldHandleCorruptRecord => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _ is fine since that's not used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check it
| import org.apache.hadoop.io.{LongWritable, Text} | ||
| import org.apache.hadoop.io.compress.GzipCodec | ||
| import org.scalatest.{BeforeAndAfterAll, FunSuite} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line is actually a style - https://github.com/databricks/scala-style-guide#imports :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be better if idea will tip about it :)
| var ei = 0 | ||
| var depth = 0 | ||
|
|
||
| def checkEmptyTag(currentLetter: Int, position: Int): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take another look for this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic is usualy you try find this end </rootTag>, i add check and for </>.
Each time we check both of this situation.
If you any suggestions for this logic i think it's better do on refactor in next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, got it but I simply meant just double checking :).
add test case pump version Signed-off-by: Anton Alexandrov <[email protected]>
Signed-off-by: Anton Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
Signed-off-by: Antony Alexandrov <[email protected]>
# Conflicts: # src/test/scala/com/databricks/spark/xml/XmlSuite.scala
|
I did fix it. |
|
Hi BioQwer, Any ideas so far? |
|
@jbelenag Hello! |
|
you are right. I rushed to take a look. |
|
@HyukjinKwon what about merging this PR? :) |
|
@HyukjinKwon what's happend? |
|
Guys, sorry for the late response. I manually resolved conflicts and opened #303 with his commit. It's pretty core fix so I had to be very careful before merging it in. |
add test case
pump version