Skip to content

Conversation

@blrnw3
Copy link

@blrnw3 blrnw3 commented Mar 8, 2017

https://issues.apache.org/jira/browse/SPARK-19871

What changes were proposed in this pull request?

Improve error message in verify_type to indicate the field which is responsible for the verification error. This is incredibly useful for tracking down type/nullability errors.

Sample changes:

Before: This field is not nullable, but got None
After: This field (my_column, of type BooleanType) is not nullable, but got None

Before: FloatType can not accept object True in type bool
After: FloatType can not accept object True in type bool for field my_column

How was this patch tested?

Unit tests pass

Please review http://spark.apache.org/contributing.html before opening a pull request.

@blrnw3 blrnw3 changed the title [SPARK-13740] [PySpark][SQL] Improve error message in verify_type to indicate the field [SPARK-19871] [PySpark][SQL] Improve error message in verify_type to indicate the field Mar 8, 2017
@AmplabJenkins
Copy link

Can one of the admins verify this patch?



def _verify_type(obj, dataType, nullable=True):
def _verify_type(obj, dataType, nullable=True, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a doctest below.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 9, 2017

I think we should run ./dev/lint-python. It seems some lines do not comply pep8 here. As a bonus, we could add before/after error messages in the PR description maybe.

@HyukjinKwon
Copy link
Member

cc @dgingrich who I guess the reporter of SPARK-19507 - what do you think about this?

@blrnw3
Copy link
Author

blrnw3 commented Mar 9, 2017

Ah yes it seems SPARK-19507 is a duplicate. My mistake. In any case I'm sure this PR satisfies that ticket

@blrnw3
Copy link
Author

blrnw3 commented Mar 9, 2017

Will add doctest and check linting.
Have added before/after error messages to the PR description

Copy link

@dgingrich dgingrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and basically the same as what I did. I had some additional handling for nested structs and map/array elements which I think would be good to add. Here's my PR (incomplete, tests not done: #17227). I'm fine with either one being merged as long as I get better debug messages in the next release :) . Let me know what you want me to do.

if isinstance(obj, dict):
for f in dataType.fields:
_verify_type(obj.get(f.name), f.dataType, f.nullable)
_verify_type(obj.get(f.name), f.dataType, f.nullable, f.name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work that well for nested structs:

MySubType = StructType([StructField('value', StringType(), nullable=False)])
MyType = StructType([
    StructField('one', MySubType),
    StructField('two', MySubType)])

_verify_type({'one': {'value': 'good'}, 'two': {'value': None}}, MyType)
# "This field (value, of type StringType) is not nullable, but got None"
# But is it one.value or two.value?

_verify_type(k, dataType.keyType, False)
_verify_type(v, dataType.valueType, dataType.valueContainsNull)
_verify_type(k, dataType.keyType, False, name)
_verify_type(v, dataType.valueType, dataType.valueContainsNull, name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also want to flag individual array/map elements.

@blrnw3
Copy link
Author

blrnw3 commented Mar 10, 2017

Yours looks more complete so I'm willing to close this if you finish it

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 10, 2017

Let me then resolve this JIRA as a duplicate and reopen @dgingrich's one as soon as this one is closed.

@blrnw3 blrnw3 closed this Mar 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants