Skip to content

Conversation

@tejasapatil
Copy link
Contributor

What changes were proposed in this pull request?

  • Refer to the Jira for the problem: jira : https://issues.apache.org/jira/browse/SPARK-14400
  • The fix is to check if the process has exited with a non-zero exit code in hasNext(). I have moved this and checking of writer thread exception to a separate method.

How was this patch tested?

  • Ran a job which had incorrect transform script command and saw that the job fails
  • Existing unit tests for ScriptTransformationSuite. Added a new unit test

@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Apr 6, 2016

Test build #55074 has finished for PR 12194 at commit df48e1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 6, 2016

cc @srowen

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55358 has finished for PR 12194 at commit e3899c3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55360 has finished for PR 12194 at commit ebb5ea1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55676 has finished for PR 12194 at commit 1054b71.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented May 10, 2016

Test build #58286 has finished for PR 12194 at commit e37e0aa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented May 17, 2016

Test build #58720 has finished for PR 12194 at commit 524eb71.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58750 has finished for PR 12194 at commit abd65d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tejasapatil
Copy link
Contributor Author

Can anyone please review this PR ?

1 similar comment
@tejasapatil
Copy link
Contributor Author

Can anyone please review this PR ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR but we should probably only print the partial contents of the circular buffer if the number of bytes written to it are less than its total size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameeragarwal : Thanks for pointing that out. I will submit a separate PR for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here : #13351

@sameeragarwal
Copy link
Member

This patch looks good to me. @srowen / @JoshRosen can one of you please take a second look as well?

@rxin
Copy link
Contributor

rxin commented May 25, 2016

@srowen want to review this? Given you reviewed the last transform pr.

@rxin
Copy link
Contributor

rxin commented May 25, 2016

@tejasapatil on a related note, we'd want to remove as much hive code dependency as possible. One command that is left is ScriptTransformation. Would you have time to implement this in sql/core without Hive dependency for ScriptTransformation? It seems like one of the primary operators you guys would use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is it me or does this get hard to follow the return values. Generally the method returns "true" unless one of several conditions caused it to decide it was finished earlier. Those could be handled with early "return false" rather than lots of "else ... true" branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i think that's a great idea. the level of nesting is a little bit too much here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Did the change

@srowen
Copy link
Member

srowen commented May 26, 2016

With one minor suggestion this LGTM

@rxin
Copy link
Contributor

rxin commented May 26, 2016

@tejasapatil want to update this so we can merge it for 2.0?

…read might be the first to see the effect of process being killed

```
Exception in thread "Thread-ScriptTransformation-Feed" java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:326)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
	at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:307)
	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:268)
	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply(ScriptTransformation.scala:268)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
	at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread.run(ScriptTransformation.scala:268)
```
@tejasapatil
Copy link
Contributor Author

@rxin : Sorry for delay... was caught up in some other things. I have updated the PR now with the review comments.

Re your suggestion about removing hive code dependency: I will work on it

@sameeragarwal
Copy link
Member

LGTM pending jenkins. Thanks!

@SparkQA
Copy link

SparkQA commented May 27, 2016

Test build #59466 has finished for PR 12194 at commit 4a7b2e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 27, 2016

Merging in master/2.0.

asfgit pushed a commit that referenced this pull request May 27, 2016
… user command

## What changes were proposed in this pull request?

- Refer to the Jira for the problem: jira : https://issues.apache.org/jira/browse/SPARK-14400
- The fix is to check if the process has exited with a non-zero exit code in `hasNext()`. I have moved this and checking of writer thread exception to a separate method.

## How was this patch tested?

- Ran a job which had incorrect transform script command and saw that the job fails
- Existing unit tests for `ScriptTransformationSuite`. Added a new unit test

Author: Tejas Patil <[email protected]>

Closes #12194 from tejasapatil/script_transform.

(cherry picked from commit a96e415)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in a96e415 May 27, 2016
asfgit pushed a commit that referenced this pull request Jun 1, 2016
…tents written if buffer isn't full

## What changes were proposed in this pull request?

1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values

2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over #12194 (comment))

3. Simplified `toString()`

## How was this patch tested?

Added new test case

Author: Tejas Patil <[email protected]>

Closes #13351 from tejasapatil/circular_buffer.

(cherry picked from commit ac38bdc)
Signed-off-by: Sean Owen <[email protected]>
asfgit pushed a commit that referenced this pull request Jun 1, 2016
…tents written if buffer isn't full

## What changes were proposed in this pull request?

1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values

2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over #12194 (comment))

3. Simplified `toString()`

## How was this patch tested?

Added new test case

Author: Tejas Patil <[email protected]>

Closes #13351 from tejasapatil/circular_buffer.
asfgit pushed a commit that referenced this pull request Jun 1, 2016
…tents written if buffer isn't full

1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values

2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over #12194 (comment))

3. Simplified `toString()`

Added new test case

Author: Tejas Patil <[email protected]>

Closes #13351 from tejasapatil/circular_buffer.

(cherry picked from commit ac38bdc)
Signed-off-by: Sean Owen <[email protected]>
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jun 1, 2016
…tents written if buffer isn't full

1. The class allocated 4x space than needed as it was using `Int` to store the `Byte` values

2. If CircularBuffer isn't full, currently toString() will print some garbage chars along with the content written as is tries to print the entire array allocated for the buffer. The fix is to keep track of buffer getting full and don't print the tail of the buffer if it isn't full (suggestion by sameeragarwal over apache#12194 (comment))

3. Simplified `toString()`

Added new test case

Author: Tejas Patil <[email protected]>

Closes apache#13351 from tejasapatil/circular_buffer.

(cherry picked from commit ac38bdc)
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 714f4d7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants