Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented May 27, 2015

A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. @davies @harsha2010

@mengxr mengxr changed the title [SPARK-7903] [MLLIB] Do not serialize Python UDTs in PythonRDD [WIP][SPARK-7903] [MLLIB] Do not serialize Python UDTs in PythonRDD May 27, 2015
@SparkQA
Copy link

SparkQA commented May 27, 2015

Test build #33609 has finished for PR 6442 at commit 5c98dce.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr mengxr changed the title [WIP][SPARK-7903] [MLLIB] Do not serialize Python UDTs in PythonRDD [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times May 28, 2015
@SparkQA
Copy link

SparkQA commented May 28, 2015

Test build #33639 has finished for PR 6442 at commit c257d2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented May 28, 2015

LGTM.

asfgit pushed a commit that referenced this pull request May 28, 2015
…lize) being called multiple times

~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010

Author: Xiangrui Meng <[email protected]>

Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:

c257d2a [Xiangrui Meng] add a workaround for VectorUDT

(cherry picked from commit 530efe3)
Signed-off-by: Xiangrui Meng <[email protected]>
@mengxr
Copy link
Contributor Author

mengxr commented May 28, 2015

Merged into master and branch-1.4.

@asfgit asfgit closed this in 530efe3 May 28, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…lize) being called multiple times

~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010

Author: Xiangrui Meng <[email protected]>

Closes apache#6442 from mengxr/SPARK-7903 and squashes the following commits:

c257d2a [Xiangrui Meng] add a workaround for VectorUDT
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…lize) being called multiple times

~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010

Author: Xiangrui Meng <[email protected]>

Closes apache#6442 from mengxr/SPARK-7903 and squashes the following commits:

c257d2a [Xiangrui Meng] add a workaround for VectorUDT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants