Skip to content

Commit 328dea6

Browse files
mstewart141HyukjinKwon
authored andcommitted
[SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE pandas_udf with keyword args
## What changes were proposed in this pull request? Add documentation about the limitations of `pandas_udf` with keyword arguments and related concepts, like `functools.partial` fn objects. NOTE: intermediate commits on this PR show some of the steps that can be taken to fix some (but not all) of these pain points. ### Survey of problems we face today: (Initialize) Note: python 3.6 and spark 2.4snapshot. ``` from pyspark.sql import SparkSession import inspect, functools from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit, udf spark = SparkSession.builder.getOrCreate() print(spark.version) df = spark.range(1,6).withColumn('b', col('id') * 2) def ok(a,b): return a+b ``` Using a keyword argument at the call site `b=...` (and yes, *full* stack trace below, haha): ``` ---> 14 df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id', b='id')).show() # no kwargs TypeError: wrapper() got an unexpected keyword argument 'b' ``` Using partial with a keyword argument where the kw-arg is the first argument of the fn: *(Aside: kind of interesting that lines 15,16 work great and then 17 explodes)* ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-9-e9f31b8799c1> in <module>() 15 df.withColumn('ok', pandas_udf(f=functools.partial(ok, 7), returnType='bigint')('id')).show() 16 df.withColumn('ok', pandas_udf(f=functools.partial(ok, b=7), returnType='bigint')('id')).show() ---> 17 df.withColumn('ok', pandas_udf(f=functools.partial(ok, a=7), returnType='bigint')('id')).show() /Users/stu/ZZ/spark/python/pyspark/sql/functions.py in pandas_udf(f, returnType, functionType) 2378 return functools.partial(_create_udf, returnType=return_type, evalType=eval_type) 2379 else: -> 2380 return _create_udf(f=f, returnType=return_type, evalType=eval_type) 2381 2382 /Users/stu/ZZ/spark/python/pyspark/sql/udf.py in _create_udf(f, returnType, evalType) 54 argspec.varargs is None: 55 raise ValueError( ---> 56 "Invalid function: 0-arg pandas_udfs are not supported. " 57 "Instead, create a 1-arg pandas_udf and ignore the arg in your function." 58 ) ValueError: Invalid function: 0-arg pandas_udfs are not supported. Instead, create a 1-arg pandas_udf and ignore the arg in your function. ``` Author: Michael (Stu) Stewart <[email protected]> Closes #20900 from mstewart141/udfkw2. (cherry picked from commit 087fb31) Signed-off-by: hyukjinkwon <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>
1 parent 2fd7aca commit 328dea6

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

python/pyspark/sql/functions.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2123,6 +2123,8 @@ def udf(f=None, returnType=StringType()):
21232123
in boolean expressions and it ends up with being executed all internally. If the functions
21242124
can fail on special rows, the workaround is to incorporate the condition into the functions.
21252125
2126+
.. note:: The user-defined functions do not take keyword arguments on the calling side.
2127+
21262128
:param f: python function if used as a standalone function
21272129
:param returnType: the return type of the user-defined function. The value can be either a
21282130
:class:`pyspark.sql.types.DataType` object or a DDL-formatted type string.
@@ -2252,6 +2254,8 @@ def pandas_udf(f=None, returnType=None, functionType=None):
22522254
.. note:: The user-defined functions do not support conditional expressions or short circuiting
22532255
in boolean expressions and it ends up with being executed all internally. If the functions
22542256
can fail on special rows, the workaround is to incorporate the condition into the functions.
2257+
2258+
.. note:: The user-defined functions do not take keyword arguments on the calling side.
22552259
"""
22562260
# decorator @pandas_udf(returnType, functionType)
22572261
is_decorator = f is None or isinstance(f, (str, DataType))

0 commit comments

Comments
 (0)