Skip to content

Conversation

@mstewart141
Copy link
Contributor

@mstewart141 mstewart141 commented Mar 3, 2018

What changes were proposed in this pull request?

Check python version to determine whether to use inspect.getargspec or inspect.getfullargspec before applying pandas_udf core logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x.

How was this patch tested?

Locally, on python 2.7 and 3.6.

@mstewart141
Copy link
Contributor Author

cc @HyukjinKwon
👍

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Mar 3, 2018

Test build #87938 has finished for PR 20728 at commit 3cd53f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mstewart141
Copy link
Contributor Author

what should next step be here?

@HyukjinKwon
Copy link
Member

Usually I leave it open for few days so that I or other reviewers can check this change. I or other reviewers will leave some review comments, or leave an approval on this PR if it looks good without additional changes. Will try to guide you explicitly here.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 4, 2018

I was just double checking if we can write a test. Mind adding the test below if it makes sense?

diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 19653072ea3..c46423ac905 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -4381,6 +4381,24 @@ class ScalarPandasUDFTests(ReusedSQLTestCase):
         result = df.withColumn('time', foo_udf(df.time))
         self.assertEquals(df.collect(), result.collect())

+    @unittest.skipIf(sys.version_info[:2] < (3, 5), "Type hints are supported from Python 3.5.")
+    def test_type_annotation(self):
+        from pyspark.sql.functions import pandas_udf
+        # Regression test to check if type hints can be used. See SPARK-23569.
+        # Note that it throws an error during compilation in lower Python versions if 'exec'
+        # is not used. Also, note that we explicitly use another dictionary to avoid modifications
+        # in the current 'locals()'.
+        #
+        # Hyukjin: I think it's an ugly way to test issues about syntax specific in
+        # higher versions of Python, which we shouldn't encourage. This was the last resort
+        # I could come up with at that time.
+        _locals = {}
+        exec(
+            "import pandas as pd\ndef noop(col: pd.Series) -> pd.Series: return col",
+            _locals)
+        df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], returnType='bigint')('id'))
+        self.assertEqual(df.first()[0], 0)
+

 @unittest.skipIf(
     not _have_pandas or not _have_pyarrow,

if sys.version_info[0] < 3:
argspec = inspect.getargspec(f)
else:
argspec = inspect.getfullargspec(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a small comment while we are here like 'getargspec ' is deprecated since version 3.0 and calling it with type hints causes an actual issue. See SPARK-23569?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do.

@HyukjinKwon
Copy link
Member

cc @ueshin, @BryanCutler, @icexelloss FYI.

@mstewart141
Copy link
Contributor Author

mstewart141 commented Mar 4, 2018

@HyukjinKwon your test definitely makes sense; yea the syntax error in py2 part is why i wasn't sure how to go about testing this in the first place. this certainly gets the job done.

@SparkQA
Copy link

SparkQA commented Mar 4, 2018

Test build #87948 has finished for PR 20728 at commit 0395690.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Member

Will merge this one if there are no more comments in few days.

@ueshin
Copy link
Member

ueshin commented Mar 5, 2018

LGTM.

@HyukjinKwon
Copy link
Member

Merged to master and branch-2.3.

asfgit pushed a commit that referenced this pull request Mar 5, 2018
…e-annotated functions

## What changes were proposed in this pull request?

Check python version to determine whether to use `inspect.getargspec` or `inspect.getfullargspec` before applying `pandas_udf` core logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x.

## How was this patch tested?

Locally, on python 2.7 and 3.6.

Author: Michael (Stu) Stewart <[email protected]>

Closes #20728 from mstewart141/pandas_udf_fix.

(cherry picked from commit 7965c91)
Signed-off-by: hyukjinkwon <[email protected]>
@asfgit asfgit closed this in 7965c91 Mar 5, 2018
@icexelloss
Copy link
Contributor

LGTM too

peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…e-annotated functions

## What changes were proposed in this pull request?

Check python version to determine whether to use `inspect.getargspec` or `inspect.getfullargspec` before applying `pandas_udf` core logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x.

## How was this patch tested?

Locally, on python 2.7 and 3.6.

Author: Michael (Stu) Stewart <[email protected]>

Closes apache#20728 from mstewart141/pandas_udf_fix.

(cherry picked from commit 7965c91)
Signed-off-by: hyukjinkwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants