Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Dec 30, 2020

What changes were proposed in this pull request?

In hive we always use

add file /path/to/script.py;
select transform(col1, col2, ..)
using 'script.py' as (col1, col2, ...)
from ...

Since in spark we wrapper script command with /bash/bin -c, in this case we will throw script.py command not found.

This pr add a SparkFile's root dir path to execution env property PATH, then sub-processor will find scrip.py as program under PATH.

Why are the changes needed?

Support SQL migration form Hive to Spark.

Does this PR introduce any user-facing change?

User can direct use script file name as program in script transform SQL.

add file /path/to/script.py;
select transform(col1, col2, ..)
using 'script.py' as (col1, col2, ...)
from ...

How was this patch tested?

UT

@AngersZhuuuu AngersZhuuuu marked this pull request as draft December 30, 2020 09:50
@github-actions github-actions bot added the SQL label Dec 30, 2020
@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38118/

@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38118/

@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Test build #133528 has finished for PR 30973 at commit c39e3cf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu AngersZhuuuu marked this pull request as ready for review December 30, 2020 15:04
@AngersZhuuuu AngersZhuuuu changed the title [SPARK-33934][SQL] Support user defined script command wrapper like hive [SPARK-33934][SQL] Support user defined script command wrapper and handle cmd arg like hive Dec 30, 2020
@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38125/

@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38128/

@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38125/

@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38128/

@SparkQA
Copy link

SparkQA commented Dec 30, 2020

Test build #133535 has finished for PR 30973 at commit 525b51e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 31, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38153/

@SparkQA
Copy link

SparkQA commented Dec 31, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38153/

@SparkQA
Copy link

SparkQA commented Dec 31, 2020

Test build #133563 has finished for PR 30973 at commit 0295501.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 3, 2021

Test build #133605 has started for PR 30973 at commit f406844.

@SparkQA
Copy link

SparkQA commented Jan 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38194/

@SparkQA
Copy link

SparkQA commented Jan 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38194/

@maropu
Copy link
Member

maropu commented Jan 4, 2021

This new feature itself looks useful. Btw, are you planning to make a PR to add a dedicated SQL document page for the TRANSFORM-related functionality, @AngersZhuuuu ? I think we need to document how to use the TRANSFORM clause for users. If a jira ticket for it doesn't exist, could you file it first?

@AngersZhuuuu
Copy link
Contributor Author

This new feature itself looks useful. Btw, are you planning to make a PR to add a dedicated SQL document page for the TRANSFORM-related functionality, @AngersZhuuuu ? I think we need to document how to use the TRANSFORM clause for users. If a jira ticket for it doesn't exist, could you file it first?

Yea, I have planed to do this since a lot change done. #29087 (comment)
I have file a ticket. https://issues.apache.org/jira/browse/SPARK-33976

@HyukjinKwon
Copy link
Member

LGTM otherwise

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-33934][SQL] Support user defined script command wrapper and handle cmd arg like hive [SPARK-33934][SQL] Add SparkFile's root dir to env property PATH Jan 4, 2021
@maropu
Copy link
Member

maropu commented Jan 4, 2021

Please update the PR description, too. Looks fine otherwise.

@AngersZhuuuu
Copy link
Contributor Author

Please update the PR description, too. Looks fine otherwise.

Done

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@HyukjinKwon
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member

Hi, all.
This seems to add a very flaky test. Could you take a look at that, please?

Screen Shot 2021-01-05 at 11 32 33 AM

package org.apache.spark.sql.execution

import java.io.{BufferedReader, InputStream, InputStreamReader, OutputStream}
import java.io._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu . nit. Please enumerate all next time.

@@ -1,3 +1,5 @@
#! /usr/bin/python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, /usr/bin/python3?

@AngersZhuuuu
Copy link
Contributor Author

Hi, all.
This seems to add a very flaky test. Could you take a look at that, please?

Screen Shot 2021-01-05 at 11 32 33 AM

Seems. same issue as last time. I will raise a fix. pr soon.

dongjoon-hyun pushed a commit that referenced this pull request Jan 6, 2021
…condition to fix flaky test

### What changes were proposed in this pull request?
Follow comment and fix. flaky test #30973 (comment).
This flaky test is similar as #30896

Some task's failed with root cause but in driver may return error without root cause , change. UT to check with status exit code since different root cause's exit code is not same.

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed UT

Closes #31046 from AngersZhuuuu/SPARK-33934-FOLLOW-UP.

Lead-authored-by: angerszhu <[email protected]>
Co-authored-by: AngersZhuuuu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants