-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file #19948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #84747 has finished for PR 19948 at commit
|
|
So if |
|
retest this please |
|
Test build #84755 has finished for PR 19948 at commit
|
|
retest this please |
|
Test build #84758 has finished for PR 19948 at commit
|
|
Thank you, @HyukjinKwon and @viirya . |
|
@viirya . |
|
Hey @dongjoon-hyun, I am going to merge this but let's leave a comment about ^ in the JIRA too for clarification. |
|
Merged to master. |
| } | ||
| } | ||
|
|
||
| test("SPARK-19809 NullPointerException on zero-size ORC file") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case should not be put in SQLQuerySuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, @gatorsmile . Which suite do you prefer? So far, this test case covers
- Both
nativeandhive - Also,
STORED ASwithCONVERT_METASTORE_ORC=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move this into HiveOrcQuerySuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. That sounds fine to me.
|
@dongjoon-hyun Could you submit a follow-up PR to move the test case? |
| Files.touch(new File(s"${dir.getCanonicalPath}", "zero.orc")) | ||
|
|
||
| withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { // default since 2.3.0 | ||
| checkAnswer(sql("SELECT * FROM spark_19809"), Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use table("spark_19809")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done.
| sql(s"CREATE TABLE spark_19809(a int) STORED AS ORC LOCATION '$dir'") | ||
| Files.touch(new File(s"${dir.getCanonicalPath}", "zero.orc")) | ||
|
|
||
| withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { // default since 2.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use both true and false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, this test case is for convertMetastoreOrc=true which bacame default in Spark 2.3.0.
false still has the issue of Hive 1.2.1 OrcInputFormat.getSplits, so I wrote like the following in PR description.
After SPARK-22279, Apache Spark with the default configuration doesn't have this bug. Although Hive 1.2.1 library code path still has the problem, we had better have a test coverage on what we have now in order to prevent future regression on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment in the test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
|
Thank you, @HyukjinKwon and @gatorsmile . I'll. |
…Suite ## What changes were proposed in this pull request? As a follow-up of apache#19948 , this PR moves the test case and adds comments. ## How was this patch tested? Pass the Jenkins. Author: Dongjoon Hyun <[email protected]> Closes apache#19960 from dongjoon-hyun/SPARK-19809-2.
What changes were proposed in this pull request?
Until 2.2.1, Spark raises
NullPointerExceptionon zero-size ORC files. Usually, these zero-size ORC files are generated by 3rd-party apps like Flume.After SPARK-22279, Apache Spark with the default configuration doesn't have this bug. Although Hive 1.2.1 library code path still has the problem, we had better have a test coverage on what we have now in order to prevent future regression on it.
How was this patch tested?
Pass a newly added test case.