[SPARK-11997][SQL] NPE when save a DataFrame as parquet and partitioned by long column #10001

dilipbiswal · 2015-11-26T10:49:04Z

Check for partition column null-ability while building the partition spec.

… long column

dilipbiswal · 2015-11-26T10:54:02Z

@davies Can you please let me know your comments.

yhuai · 2015-11-26T20:00:58Z

ok to test

yhuai · 2015-11-26T20:10:31Z

@dilipbiswal Do you know why it is broken?

dilipbiswal · 2015-11-26T22:00:24Z

@yhuai Hi Yin, in the discoverPartitions method, we are trying to create the partition spec and are trying to cast a partition value to the corresponding user specified schema's column type. In case of null partition value the following code raises a null poiner exception in row.getString(i).

Cast(Literal.create(row.getString(i), StringType), userProvidedSchema.fields(i).dataType).eval()

in this fix i am trying to check for null first and create a null literal of string type as opposed to calling row.getString(). Hope it is okay .. Please let me know.

SparkQA · 2015-11-26T22:17:17Z

Test build #46777 has finished for PR 10001 at commit af508de.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-11-27T00:13:32Z

sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala

I think it's better to check null in InternalRow.getString()

@davies Thanks for reviewing the change. In cases when we know in advance that schema does not allow nulls , sometimes we can skip this null check. By moving it to internalRow , would we loose the opportunity to optimize ? Pl. let me know.

The null-checking is cheap.

Or we could use Literal.create(row.getUTF8String(i), StringType), to avoid the conversion between String and UTF8String.

Sounds good @davies .Made the change. Thanks a lot for your feedback.

yhuai · 2015-11-27T00:25:21Z

@dilipbiswal I guess I did not ask my question clearly. I meant why 1.5 is good but 1.6 is broken. There must be a change that exposed this issue.

dilipbiswal · 2015-11-27T00:35:14Z

@yhuai Yeah.. this function discoverPartitions() was changed recently (10/22/2015) as part of spark-9735.

davies · 2015-11-27T01:42:55Z

LGTM

SparkQA · 2015-11-27T03:26:05Z

Test build #46791 has finished for PR 10001 at commit 4de7697.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…ned by long column Check for partition column null-ability while building the partition spec. Author: Dilip Biswal <[email protected]> Closes #10001 from dilipbiswal/spark-11997. (cherry picked from commit a374e20) Signed-off-by: Davies Liu <[email protected]>

[SPARK-11997] NPE when save a DataFrame as parquet and partitioned by…

af508de

… long column

davies reviewed Nov 27, 2015
View reviewed changes

review comments from Davies

4de7697

asfgit closed this in a374e20 Nov 27, 2015

[SPARK-11997][SQL] NPE when save a DataFrame as parquet and partitioned by long column #10001

[SPARK-11997][SQL] NPE when save a DataFrame as parquet and partitioned by long column #10001

Uh oh!

Conversation

dilipbiswal commented Nov 26, 2015

Uh oh!

dilipbiswal commented Nov 26, 2015

Uh oh!

yhuai commented Nov 26, 2015

Uh oh!

yhuai commented Nov 26, 2015

Uh oh!

dilipbiswal commented Nov 26, 2015

Uh oh!

SparkQA commented Nov 26, 2015

Uh oh!

davies Nov 27, 2015

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Nov 27, 2015

Choose a reason for hiding this comment

Uh oh!

davies Nov 27, 2015

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Nov 27, 2015

Choose a reason for hiding this comment

Uh oh!

yhuai commented Nov 27, 2015

Uh oh!

dilipbiswal commented Nov 27, 2015

Uh oh!

davies commented Nov 27, 2015

Uh oh!

SparkQA commented Nov 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants