-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11997][SQL] NPE when save a DataFrame as parquet and partitioned by long column #10001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@davies Can you please let me know your comments. |
|
ok to test |
|
@dilipbiswal Do you know why it is broken? |
|
@yhuai Hi Yin, in the discoverPartitions method, we are trying to create the partition spec and are trying to cast a partition value to the corresponding user specified schema's column type. In case of null partition value the following code raises a null poiner exception in row.getString(i). Cast(Literal.create(row.getString(i), StringType), userProvidedSchema.fields(i).dataType).eval() in this fix i am trying to check for null first and create a null literal of string type as opposed to calling row.getString(). Hope it is okay .. Please let me know. |
|
Test build #46777 has finished for PR 10001 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to check null in InternalRow.getString()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies Thanks for reviewing the change. In cases when we know in advance that schema does not allow nulls , sometimes we can skip this null check. By moving it to internalRow , would we loose the opportunity to optimize ? Pl. let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null-checking is cheap.
Or we could use Literal.create(row.getUTF8String(i), StringType), to avoid the conversion between String and UTF8String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @davies .Made the change. Thanks a lot for your feedback.
|
@dilipbiswal I guess I did not ask my question clearly. I meant why 1.5 is good but 1.6 is broken. There must be a change that exposed this issue. |
|
@yhuai Yeah.. this function discoverPartitions() was changed recently (10/22/2015) as part of spark-9735. |
|
LGTM |
|
Test build #46791 has finished for PR 10001 at commit
|
…ned by long column Check for partition column null-ability while building the partition spec. Author: Dilip Biswal <[email protected]> Closes #10001 from dilipbiswal/spark-11997. (cherry picked from commit a374e20) Signed-off-by: Davies Liu <[email protected]>
Check for partition column null-ability while building the partition spec.