-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style type-annotated functions #20728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e-annotated functions
|
cc @HyukjinKwon |
|
ok to test |
|
Test build #87938 has finished for PR 20728 at commit
|
|
what should next step be here? |
|
Usually I leave it open for few days so that I or other reviewers can check this change. I or other reviewers will leave some review comments, or leave an approval on this PR if it looks good without additional changes. Will try to guide you explicitly here. |
|
I was just double checking if we can write a test. Mind adding the test below if it makes sense? diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 19653072ea3..c46423ac905 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -4381,6 +4381,24 @@ class ScalarPandasUDFTests(ReusedSQLTestCase):
result = df.withColumn('time', foo_udf(df.time))
self.assertEquals(df.collect(), result.collect())
+ @unittest.skipIf(sys.version_info[:2] < (3, 5), "Type hints are supported from Python 3.5.")
+ def test_type_annotation(self):
+ from pyspark.sql.functions import pandas_udf
+ # Regression test to check if type hints can be used. See SPARK-23569.
+ # Note that it throws an error during compilation in lower Python versions if 'exec'
+ # is not used. Also, note that we explicitly use another dictionary to avoid modifications
+ # in the current 'locals()'.
+ #
+ # Hyukjin: I think it's an ugly way to test issues about syntax specific in
+ # higher versions of Python, which we shouldn't encourage. This was the last resort
+ # I could come up with at that time.
+ _locals = {}
+ exec(
+ "import pandas as pd\ndef noop(col: pd.Series) -> pd.Series: return col",
+ _locals)
+ df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], returnType='bigint')('id'))
+ self.assertEqual(df.first()[0], 0)
+
@unittest.skipIf(
not _have_pandas or not _have_pyarrow, |
| if sys.version_info[0] < 3: | ||
| argspec = inspect.getargspec(f) | ||
| else: | ||
| argspec = inspect.getfullargspec(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a small comment while we are here like 'getargspec ' is deprecated since version 3.0 and calling it with type hints causes an actual issue. See SPARK-23569?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do.
|
cc @ueshin, @BryanCutler, @icexelloss FYI. |
|
@HyukjinKwon your test definitely makes sense; yea the syntax error in py2 part is why i wasn't sure how to go about testing this in the first place. this certainly gets the job done. |
|
Test build #87948 has finished for PR 20728 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Will merge this one if there are no more comments in few days. |
|
LGTM. |
|
Merged to master and branch-2.3. |
…e-annotated functions ## What changes were proposed in this pull request? Check python version to determine whether to use `inspect.getargspec` or `inspect.getfullargspec` before applying `pandas_udf` core logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x. ## How was this patch tested? Locally, on python 2.7 and 3.6. Author: Michael (Stu) Stewart <[email protected]> Closes #20728 from mstewart141/pandas_udf_fix. (cherry picked from commit 7965c91) Signed-off-by: hyukjinkwon <[email protected]>
|
LGTM too |
…e-annotated functions ## What changes were proposed in this pull request? Check python version to determine whether to use `inspect.getargspec` or `inspect.getfullargspec` before applying `pandas_udf` core logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x. ## How was this patch tested? Locally, on python 2.7 and 3.6. Author: Michael (Stu) Stewart <[email protected]> Closes apache#20728 from mstewart141/pandas_udf_fix. (cherry picked from commit 7965c91) Signed-off-by: hyukjinkwon <[email protected]>
What changes were proposed in this pull request?
Check python version to determine whether to use
inspect.getargspecorinspect.getfullargspecbefore applyingpandas_udfcore logic to a function. The former is python2.7 (deprecated in python3) and the latter is python3.x. The latter correctly accounts for type annotations, which are syntax errors in python2.x.How was this patch tested?
Locally, on python 2.7 and 3.6.