[SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function #25950

HyukjinKwon · 2019-09-27T13:04:39Z

What changes were proposed in this pull request?

This PR makes element_at in PySpark able to take PySpark Column instances.

Why are the changes needed?

To match with Scala side. Seems it was intended but not working correctly as a bug.

Does this PR introduce any user-facing change?

Yes. See below:

from pyspark.sql import functions as F
x = spark.createDataFrame([([1,2,3],1),([4,5,6],2),([7,8,9],3)],['list','num'])
x.withColumn('aa',F.element_at('list',x.num.cast('int'))).show()

Before:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/functions.py", line 2059, in element_at
    return Column(sc._jvm.functions.element_at(_to_java_column(col), extraction))
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args
  File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args
  File "/.../forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert
  File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__
    raise TypeError("Column is not iterable")
TypeError: Column is not iterable

After:

+---------+---+---+
|     list|num| aa|
+---------+---+---+
|[1, 2, 3]|  1|  1|
|[4, 5, 6]|  2|  5|
|[7, 8, 9]|  3|  9|
+---------+---+---+

How was this patch tested?

Manually tested against literal, Python native types, and PySpark column.

…ction

HyukjinKwon · 2019-09-27T13:04:59Z

python/pyspark/sql/functions.py

    sc = SparkContext._active_spark_context
-    return Column(sc._jvm.functions.element_at(_to_java_column(col), extraction))
+    return Column(sc._jvm.functions.element_at(
+        _to_java_column(col), lit(extraction)._jc))  # noqa: F821 'lit' is dynamically defined.


Otherwise:

flake8 checks failed: ./python/pyspark/sql/functions.py:2059:70: F821 undefined name 'lit' return Column(sc._jvm.functions.element_at(_to_java_column(col), lit(extraction)._jc)) ^ 1 F821 undefined name 'lit' 1 1

SparkQA · 2019-09-27T13:33:22Z

Test build #111482 has finished for PR 25950 at commit a86d188.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @HyukjinKwon . Merged to master/2.4.

…lumn in element_at function ### What changes were proposed in this pull request? This PR makes `element_at` in PySpark able to take PySpark `Column` instances. ### Why are the changes needed? To match with Scala side. Seems it was intended but not working correctly as a bug. ### Does this PR introduce any user-facing change? Yes. See below: ```python from pyspark.sql import functions as F x = spark.createDataFrame([([1,2,3],1),([4,5,6],2),([7,8,9],3)],['list','num']) x.withColumn('aa',F.element_at('list',x.num.cast('int'))).show() ``` Before: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/functions.py", line 2059, in element_at return Column(sc._jvm.functions.element_at(_to_java_column(col), extraction)) File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__ File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args File "/.../forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__ raise TypeError("Column is not iterable") TypeError: Column is not iterable ``` After: ``` +---------+---+---+ | list|num| aa| +---------+---+---+ |[1, 2, 3]| 1| 1| |[4, 5, 6]| 2| 5| |[7, 8, 9]| 3| 9| +---------+---+---+ ``` ### How was this patch tested? Manually tested against literal, Python native types, and PySpark column. Closes #25950 from HyukjinKwon/SPARK-29240. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit fda0e6e) Signed-off-by: Dongjoon Hyun <[email protected]>

Pass Py4J columm instance to support PySpark column in element_at fun…

a86d188

…ction

HyukjinKwon commented Sep 27, 2019

View reviewed changes

dongjoon-hyun added the PYSPARK label Sep 27, 2019

dongjoon-hyun approved these changes Sep 27, 2019

View reviewed changes

dongjoon-hyun closed this in fda0e6e Sep 27, 2019

zero323 mentioned this pull request Sep 28, 2019

Sync with changes merged after 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 zero323/pyspark-stubs#230

Closed

47 tasks

HyukjinKwon deleted the SPARK-29240 branch March 3, 2020 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function #25950

[SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function #25950

Uh oh!

HyukjinKwon commented Sep 27, 2019 •

edited

Loading

Uh oh!

HyukjinKwon Sep 27, 2019

Uh oh!

SparkQA commented Sep 27, 2019

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function #25950

[SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function #25950

Uh oh!

Conversation

HyukjinKwon commented Sep 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon Sep 27, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 27, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HyukjinKwon commented Sep 27, 2019 •

edited

Loading