Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

register following classes in Kryo:
org.apache.spark.mllib.regression.LabeledPoint
org.apache.spark.mllib.clustering.VectorWithNorm
org.apache.spark.ml.feature.LabeledPoint
org.apache.spark.ml.tree.impl.TreePoint

org.apache.spark.ml.tree.impl.BaggedPoint seems also need to be registered, but I don't know how to do it in this safe way.
@WeichenXu123 @cloud-fan

How was this patch tested?

added tests

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Was thinking it's kind of overkill to make new test suites to test that each of several classes serializes via the same mechanism. I'd imagine one test case for all of them is fine, somewhere. I don't feel strongly about it.

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84765 has finished for PR 19950 at commit 0e825c5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng zhengruifeng force-pushed the labeled_kryo branch 2 times, most recently from 183868c to 024d835 Compare December 13, 2017 03:37
@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84824 has finished for PR 19950 at commit 183868c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84828 has finished for PR 19950 at commit 024d835.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Dec 13, 2017

Since VectorWithNorm and TreePoint do not override method equals, we can not directly using === to compare objects.
LabeledPoint is a case class, whose method equals is automaticly supplied

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space before brace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there much value in defining this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think there is not much value to do this, although current testsuites are all like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise these seem like things you can just write in a loop over several objects to test

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85085 has finished for PR 19950 at commit 6f51096.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85084 has finished for PR 19950 at commit daba630.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class LabeledPointSuite extends SparkFunSuite
  • class TreePointSuite extends SparkFunSuite

"org.apache.spark.ml.feature.OffsetInstance"
"org.apache.spark.ml.feature.OffsetInstance",
"org.apache.spark.ml.feature.LabeledPoint",
"org.apache.spark.ml.tree.impl.TreePoint"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to write these items in alphabet order, so we can easily check whether it miss some item and easier to add more items in the future.

@WeichenXu123
Copy link
Contributor

And, these items added cannot cover the case in MultilayerPeceptron. Look at FeedForwardTrainer.train, the persisted stacked trainData, the format is RDD[(Double, mllib.Vector)]. The registered classes here do not cover this.

@zhengruifeng
Copy link
Contributor Author

@WeichenXu123 I am not very sure, but it seems that Kryo will automatic ser/deser Tuple2[A, B] type if both A and B have been registered:

scala> import org.apache.spark.SparkConf
import org.apache.spark.SparkConf

scala> import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, Vectors}
import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, Vectors}

scala> import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.serializer.KryoSerializer

scala> val conf = new SparkConf(false)
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@71b0289b

scala> conf.set("spark.kryo.registrationRequired", "true")
res0: org.apache.spark.SparkConf = org.apache.spark.SparkConf@71b0289b

scala> val ser = new KryoSerializer(conf).newInstance()
ser: org.apache.spark.serializer.SerializerInstance = org.apache.spark.serializer.KryoSerializerInstance@33430fc

scala> class X (val values: Array[Double])
defined class X

scala> val x = new X(Array(1.0,2.0))
x: X = X@69d58731

scala> val x2 = ser.deserialize[X](ser.serialize(x))
java.lang.IllegalArgumentException: Class is not registered: X
Note: To register this class use: kryo.register(X.class);
  at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
  at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52)
  at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
  at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
  at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
  at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:346)
  ... 49 elided

scala> val t1 = (1.0, Vectors.dense(Array(1.0, 2.0)))
t1: (Double, org.apache.spark.mllib.linalg.Vector) = (1.0,[1.0,2.0])

scala> val t2 = ser.deserialize[(Double, Vector)](ser.serialize(t1))
t2: (Double, org.apache.spark.mllib.linalg.Vector) = (1.0,[1.0,2.0])

@WeichenXu123
Copy link
Contributor

WeichenXu123 commented Dec 20, 2017

@cloud-fan Does it works like: If A and B are any class which is registered, then Type Tuple2[A, B] will be automatically registered for kyro ?

It looks like kyro will automatically handle any Tuple types, as long as inner type in Tuple being registered.

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85147 has finished for PR 19950 at commit 3d1d3db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85148 has finished for PR 19950 at commit 604cb7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 22, 2017

Merged to master

@asfgit asfgit closed this in a36b78b Dec 22, 2017
@zhengruifeng zhengruifeng deleted the labeled_kryo branch December 22, 2017 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants