-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15143][SPARK-15144][SQL] Add CSV tests with HadoopFsRelationTest and support for nullValue for other types #12921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| verifyCars(cars, withHeader = true, checkValues = false) | ||
| val results = cars.collect() | ||
| assert(results(0).toSeq === Array(2012, "Tesla", "S", "null", "null")) | ||
| assert(results(0).toSeq === Array(2012, "Tesla", "S", null, null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being tested against the data as below:
year,make,model,comment,blank
"2012","Tesla","S",null,
1997,Ford,E350,"Go get one now they are going fast",
null,Chevy,Volt
Since the header is year,make,model,comment,blank, this should produce the values 2012,Tesla,S,null,null because nullValue is set to "null".
|
Test build #57845 has finished for PR 12921 at commit
|
|
Test build #57846 has finished for PR 12921 at commit
|
|
Test build #57847 has finished for PR 12921 at commit
|
| DateTimeUtils.millisToDays(DateTimeUtils.stringToTime(datum).getTime) | ||
| case _: StringType => UTF8String.fromString(datum) | ||
| case _ => throw new RuntimeException(s"Unsupported type: ${castType.typeName}") | ||
| if (datum == null || (datum == options.nullValue && nullable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply the logic below was added just like inferField():
if (datum == null || (datum == options.nullValue && nullable)) {
null
} else {
... |
Test build #57856 has finished for PR 12921 at commit
|
| UTF8String.fromString("")) | ||
| assert( | ||
| CSVTypeCast.castTo("", StringType, nullable = false, CSVOptions()) == | ||
| CSVTypeCast.castTo("", StringType, nullable = false, CSVOptions("nullValue", null)) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@falaki I just noticed and thought this test implies nullValue does not apply for StringType. Is this intendedly being exclusive? I thought nullValue should be applied for all the types equivalently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, nulls for StringType will be lost in the roundtrip of reading and writing.
|
Test build #57871 has finished for PR 12921 at commit
|
|
Test build #58045 has finished for PR 12921 at commit
|
|
Test build #58319 has finished for PR 12921 at commit
|
|
Hi @cloud-fan, Could you please take a look? |
|
Closing this since another PR has (I think) a better change. I will maybe submit another PR for adding some tests in the future. |
What changes were proposed in this pull request?
Currently,
nullValueoption does not work for some types,BooleanType,TimestampType,DateTypeandStringType. So, currently there is no way to read null for those types. This PR adds the support just like the other types.Also, CSV data source is not being tested with
HadoopFsRelationTestas aHadoopFsRelation.HadoopFsRelationTestincludes 50 more tests (eg. partitioned table tests).This PR adds two variables,
extraReadOptionsandextraWriteOptionsinHadoopFsRelationTestso that the child class gives some options for reading and writing. In order to get the tests inHadoopFsRelationTestpassed, CSV data source needs to give optionsheaderandinferSchemaastruefor reading andheaderastruefor writing.How was this patch tested?
Unittests in
CSVHadoopFsRelationTest,CSVTypeCastSuiteand edited tests inCSVSuite