-
Notifications
You must be signed in to change notification settings - Fork 441
Support for nullable schema types #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to check for nullability for StringType. Basically I think the following logic would be simple an intuitive:
- if StringType just return the token (don't need to check for nullability)
- if any other time, if token is "", return null, else cast.
I suggest moving this logic (along with some comment that explains it) to TypeCast.castTo (maybe as a simple private method). This way you can add a few simple unit tests for it. Added benefit is we keep CsvRelation simpler.
|
Thanks @dtpeacock for submitting this. I left one comment, please address that and I will merge it. I am planning to cut a release after this. |
|
Thanks, I've addressed the comment and added a couple more tests! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be simpler:
if (datum == "" && nullable && !castType.isInstanceOf[StringType]) {
null
} else {
castType match {
case _: ByteType => datum.toByte
case _: ShortType => datum.toShort
case _: IntegerType => datum.toInt
case _: LongType => datum.toLong
case _: FloatType => datum.toFloat
case _: DoubleType => datum.toDouble
case _: BooleanType => datum.toBoolean
case _: DecimalType => new BigDecimal(datum.replaceAll(",", ""))
// TODO(hossein): would be good to support other common timestamp formats
case _: TimestampType => Timestamp.valueOf(datum)
// TODO(hossein): would be good to support other common date formats
case _: DateType => Date.valueOf(datum)
case _: StringType => datum
case _ => throw new RuntimeException(s"Unsupported type: ${castType.typeName}")
}
}|
Agreed that logic is a bit simpler |
Support for nullable schema types
This closes #85 and closes #97 .
This seems to be most sensible behavior - if the schema specifies a nullable type, it should return null on an empty field.
To maintain backwards compatibility, I have left it so that in the case of a StringType, "" is passed back rather than null (i.e. fieldname='' does not have to be changed to fieldname=null).
Its worth confirming what the desired behavior should be here as in #86 - should "" be treated as null or an empty string, as here would be the place to change it.