Skip to content

Conversation

@dtpeacock
Copy link
Contributor

This closes #85 and closes #97 .

This seems to be most sensible behavior - if the schema specifies a nullable type, it should return null on an empty field.

To maintain backwards compatibility, I have left it so that in the case of a StringType, "" is passed back rather than null (i.e. fieldname='' does not have to be changed to fieldname=null).

Its worth confirming what the desired behavior should be here as in #86 - should "" be treated as null or an empty string, as here would be the place to change it.

@codecov-io
Copy link

Current coverage is 85.31%

Merging #102 into master will not effect coverage as of 3570e1c

@@            master    #102   diff @@
======================================
  Files           10      10       
  Stmts          388     388       
  Branches       117     117       
  Methods          0       0       
======================================
  Hit            331     331       
  Partial          0       0       
  Missed          57      57       

Review entire Coverage Diff

Powered by Codecov

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to check for nullability for StringType. Basically I think the following logic would be simple an intuitive:

  • if StringType just return the token (don't need to check for nullability)
  • if any other time, if token is "", return null, else cast.

I suggest moving this logic (along with some comment that explains it) to TypeCast.castTo (maybe as a simple private method). This way you can add a few simple unit tests for it. Added benefit is we keep CsvRelation simpler.

@falaki
Copy link
Member

falaki commented Jul 16, 2015

Thanks @dtpeacock for submitting this. I left one comment, please address that and I will merge it. I am planning to cut a release after this.

@dtpeacock
Copy link
Contributor Author

Thanks, I've addressed the comment and added a couple more tests!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be simpler:

if (datum == "" && nullable && !castType.isInstanceOf[StringType]) {
  null
} else {
  castType match {
      case _: ByteType => datum.toByte
      case _: ShortType => datum.toShort
      case _: IntegerType => datum.toInt
      case _: LongType => datum.toLong
      case _: FloatType => datum.toFloat
      case _: DoubleType => datum.toDouble
      case _: BooleanType => datum.toBoolean
      case _: DecimalType => new BigDecimal(datum.replaceAll(",", ""))
      // TODO(hossein): would be good to support other common timestamp formats
      case _: TimestampType => Timestamp.valueOf(datum)
      // TODO(hossein): would be good to support other common date formats
      case _: DateType => Date.valueOf(datum)
      case _: StringType => datum
      case _ => throw new RuntimeException(s"Unsupported type: ${castType.typeName}")
  }
}

@dtpeacock
Copy link
Contributor Author

Agreed that logic is a bit simpler

falaki added a commit that referenced this pull request Jul 17, 2015
@falaki falaki merged commit 451cceb into databricks:master Jul 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants