[SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times #6442

mengxr · 2015-05-27T20:01:36Z

~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. @davies @harsha2010

SparkQA · 2015-05-27T22:01:35Z

Test build #33609 has finished for PR 6442 at commit 5c98dce.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-05-28T05:06:21Z

Test build #33639 has finished for PR 6442 at commit c257d2a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-05-28T18:01:30Z

LGTM.

…lize) being called multiple times ~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~ The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010 Author: Xiangrui Meng <[email protected]> Closes #6442 from mengxr/SPARK-7903 and squashes the following commits: c257d2a [Xiangrui Meng] add a workaround for VectorUDT (cherry picked from commit 530efe3) Signed-off-by: Xiangrui Meng <[email protected]>

mengxr · 2015-05-28T19:04:13Z

Merged into master and branch-1.4.

…lize) being called multiple times ~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~ The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010 Author: Xiangrui Meng <[email protected]> Closes apache#6442 from mengxr/SPARK-7903 and squashes the following commits: c257d2a [Xiangrui Meng] add a workaround for VectorUDT

mengxr changed the title ~~[SPARK-7903] [MLLIB] Do not serialize Python UDTs in PythonRDD~~ [WIP][SPARK-7903] [MLLIB] Do not serialize Python UDTs in PythonRDD May 27, 2015

add a workaround for VectorUDT

c257d2a

mengxr force-pushed the SPARK-7903 branch from 5c98dce to c257d2a Compare May 28, 2015 03:16

mengxr changed the title ~~[WIP][SPARK-7903] [MLLIB] Do not serialize Python UDTs in PythonRDD~~ [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times May 28, 2015

asfgit closed this in 530efe3 May 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times #6442

[SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times #6442

Uh oh!

mengxr commented May 27, 2015

Uh oh!

SparkQA commented May 27, 2015

Uh oh!

SparkQA commented May 28, 2015

Uh oh!

davies commented May 28, 2015

Uh oh!

mengxr commented May 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times #6442

[SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times #6442

Uh oh!

Conversation

mengxr commented May 27, 2015

Uh oh!

SparkQA commented May 27, 2015

Uh oh!

SparkQA commented May 28, 2015

Uh oh!

davies commented May 28, 2015

Uh oh!

mengxr commented May 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants