-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14400] [SQL] ScriptTransformation does not fail the job for bad user command #12194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #55074 has finished for PR 12194 at commit
|
|
cc @srowen |
|
Test build #55358 has finished for PR 12194 at commit
|
e3899c3 to
ebb5ea1
Compare
|
ok to test |
|
Test build #55360 has finished for PR 12194 at commit
|
ebb5ea1 to
1054b71
Compare
|
ok to test |
|
Test build #55676 has finished for PR 12194 at commit
|
1054b71 to
e37e0aa
Compare
|
ok to test |
|
Test build #58286 has finished for PR 12194 at commit
|
e37e0aa to
524eb71
Compare
|
ok to test |
|
Test build #58720 has finished for PR 12194 at commit
|
524eb71 to
abd65d8
Compare
|
ok to test |
|
Test build #58750 has finished for PR 12194 at commit
|
|
Can anyone please review this PR ? |
1 similar comment
|
Can anyone please review this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR but we should probably only print the partial contents of the circular buffer if the number of bytes written to it are less than its total size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameeragarwal : Thanks for pointing that out. I will submit a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here : #13351
|
This patch looks good to me. @srowen / @JoshRosen can one of you please take a second look as well? |
|
@srowen want to review this? Given you reviewed the last transform pr. |
|
@tejasapatil on a related note, we'd want to remove as much hive code dependency as possible. One command that is left is ScriptTransformation. Would you have time to implement this in sql/core without Hive dependency for ScriptTransformation? It seems like one of the primary operators you guys would use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is it me or does this get hard to follow the return values. Generally the method returns "true" unless one of several conditions caused it to decide it was finished earlier. Those could be handled with early "return false" rather than lots of "else ... true" branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i think that's a great idea. the level of nesting is a little bit too much here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Did the change
|
With one minor suggestion this LGTM |
|
@tejasapatil want to update this so we can merge it for 2.0? |
- Added test case
…read might be the first to see the effect of process being killed ``` Exception in thread "Thread-ScriptTransformation-Feed" java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at java.io.FilterOutputStream.close(FilterOutputStream.java:158) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:307) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:268) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:268) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793) at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:268) ```
abd65d8 to
4a7b2e8
Compare
|
@rxin : Sorry for delay... was caught up in some other things. I have updated the PR now with the review comments. Re your suggestion about removing hive code dependency: I will work on it |
|
LGTM pending jenkins. Thanks! |
|
Test build #59466 has finished for PR 12194 at commit
|
|
Merging in master/2.0. |
… user command ## What changes were proposed in this pull request? - Refer to the Jira for the problem: jira : https://issues.apache.org/jira/browse/SPARK-14400 - The fix is to check if the process has exited with a non-zero exit code in `hasNext()`. I have moved this and checking of writer thread exception to a separate method. ## How was this patch tested? - Ran a job which had incorrect transform script command and saw that the job fails - Existing unit tests for `ScriptTransformationSuite`. Added a new unit test Author: Tejas Patil <[email protected]> Closes #12194 from tejasapatil/script_transform. (cherry picked from commit a96e415) Signed-off-by: Reynold Xin <[email protected]>
…tents written if buffer isn't full ## What changes were proposed in this pull request? 1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values 2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over #12194 (comment)) 3. Simplified `toString()` ## How was this patch tested? Added new test case Author: Tejas Patil <[email protected]> Closes #13351 from tejasapatil/circular_buffer. (cherry picked from commit ac38bdc) Signed-off-by: Sean Owen <[email protected]>
…tents written if buffer isn't full ## What changes were proposed in this pull request? 1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values 2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over #12194 (comment)) 3. Simplified `toString()` ## How was this patch tested? Added new test case Author: Tejas Patil <[email protected]> Closes #13351 from tejasapatil/circular_buffer.
…tents written if buffer isn't full 1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values 2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over #12194 (comment)) 3. Simplified `toString()` Added new test case Author: Tejas Patil <[email protected]> Closes #13351 from tejasapatil/circular_buffer. (cherry picked from commit ac38bdc) Signed-off-by: Sean Owen <[email protected]>
…tents written if buffer isn't full 1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values 2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over apache#12194 (comment)) 3. Simplified `toString()` Added new test case Author: Tejas Patil <[email protected]> Closes apache#13351 from tejasapatil/circular_buffer. (cherry picked from commit ac38bdc) Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 714f4d7)
What changes were proposed in this pull request?
hasNext(). I have moved this and checking of writer thread exception to a separate method.How was this patch tested?
ScriptTransformationSuite. Added a new unit test