-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30772][ML][SQL] avoid tuple assignment because it will circumvent the transient tag #27523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30772][ML][SQL] avoid tuple assignment because it will circumvent the transient tag #27523
Conversation
|
I use following code to check this issue: import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream}
def serialise(value: Any): Array[Byte] = {
val stream: ByteArrayOutputStream = new ByteArrayOutputStream()
val oos = new ObjectOutputStream(stream)
oos.writeObject(value)
oos.close()
stream.toByteArray
}
def deserialise(bytes: Array[Byte]): Any = {
val ois = new ObjectInputStream(new ByteArrayInputStream(bytes))
val value = ois.readObject
ois.close()
value
}
class A extends Serializable { @transient lazy val a = {println("get a"); System.currentTimeMillis} }
val a = new A
a.a
val a2 = deserialise(serialise(a)).asInstanceOf[A]
a2.a
a.a == a2.a
class B extends Serializable { @transient lazy val (a,b) = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)} }
val b = new B
b.a
val b2 = deserialise(serialise(b)).asInstanceOf[B]
b2.a
b.a == b2.a
b.b == b2.b
class C extends Serializable { @transient lazy val t = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)}; @transient lazy val a = t._1; @transient lazy val b = t._2 }
val c = new C
c.a
val c2 = deserialise(serialise(c)).asInstanceOf[C]
c2.a
c.a == c2.a
c.b == c2.bResult: scala> class A extends Serializable { @transient lazy val a = {println("get a"); System.currentTimeMillis} }
defined class A
scala> val a = new A
a: A = A@68ef01a5
scala> a.a
get a
res0: Long = 1581333143300
scala> val a2 = deserialise(serialise(a)).asInstanceOf[A]
a2: A = A@f017dd0
scala> a2.a
get a
res1: Long = 1581333143523
scala> a.a == a2.a
res2: Boolean = false
scala>
scala> class B extends Serializable { @transient lazy val (a,b) = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)} }
defined class B
scala> val b = new B
b: B = B@1d008e61
scala> b.a
get a & b
res3: Long = 1581333144022
scala> val b2 = deserialise(serialise(b)).asInstanceOf[B]
b2: B = B@6ab826bb
scala> b2.a
res4: Long = 1581333144022
scala> b.a == b2.a
res5: Boolean = true
scala> b.b == b2.b
res6: Boolean = true
scala>
scala> class C extends Serializable { @transient lazy val t = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)}; @transient lazy val a = t._1; @transient lazy val b = t._2 }
defined class C
scala> val c = new C
c: C = C@7ec01440
scala> c.a
get a & b
res7: Long = 1581333144575
scala> val c2 = deserialise(serialise(c)).asInstanceOf[C]
c2: C = C@42a698bd
scala> c2.a
get a & b
res8: Long = 1581333144713
scala> c.a == c2.a
res9: Boolean = false
scala> c.b == c2.b
res10: Boolean = falseWe can see that |
|
friendly ping @srowen , I can not find in the scala community any description or disscussin on this issue. |
|
Test build #118156 has finished for PR 27523 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. @sethah do you recall this? you added the comment in 1db1c65#diff-668c79317c51f40df870d3404d8a731fR921 several years ago.
So, if this isn't actually avoiding serialization, but still works, another option is to decide that's fine and just let them serialize. I don't know which of them are actually really big, but, that would keep existing behavior right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this pattern is a little ugly / adds extra overhead, but might be necessary vs just referencing "elementIndexVar._1" later in the code or something similar.
|
What's the question 😛 ? See my comments here: #14109 Also the databricks style guide mentions it: https://github.com/databricks/scala-style-guide#destructuring-binds |
|
Oh ha thanks @sethah ! never seen that. Yeah I was just asking if you knew anything more about why or what is happening here. That's pretty good. |
|
Heh np, I remember it being really subtle! But that's about all I can give you. |
acturally, I do know how big those variables in SQL are. If they are supposed to be transient, I guess that should keep existing behavior. |
|
retest this please |
|
Test build #118202 has finished for PR 27523 at commit
|
|
I'm OK with it, I guess I was just saying that the current behavior is not as intended, but still 'works' it seems. Maybe in some cases they aren't that big. But if you're pretty confident they are then this is a 'win'. |
|
Jenkins retest this please |
|
Test build #118371 has finished for PR 27523 at commit
|
|
@srowen acturally, I do not know how big the objects in SQL are. So I agree to let them alone, and only change the ML side. |
477f99e to
968ffb3
Compare
|
Test build #118407 has finished for PR 27523 at commit
|
|
Oh, I don't think it's 'wrong' to fix. Or if we know the intended behavior isn't working, just undo the intention (i.e. remove transient lazy). Well, either way. |
|
Merged to master |
…ent the transient tag ### What changes were proposed in this pull request? it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that : > // do not use tuple assignment above because it will circumvent the transient tag I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241) ### Why are the changes needed? avoid tuple assignment because it will circumvent the transient tag ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites Closes apache#27523 from zhengruifeng/avoid_tuple_assign_to_transient. Authored-by: zhengruifeng <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…ent the transient tag ### What changes were proposed in this pull request? it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that : > // do not use tuple assignment above because it will circumvent the transient tag I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241) ### Why are the changes needed? avoid tuple assignment because it will circumvent the transient tag ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites Closes apache#27523 from zhengruifeng/avoid_tuple_assign_to_transient. Authored-by: zhengruifeng <[email protected]> Signed-off-by: Sean Owen <[email protected]>
…ent the transient tag ### What changes were proposed in this pull request? it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that : > // do not use tuple assignment above because it will circumvent the transient tag I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241) ### Why are the changes needed? avoid tuple assignment because it will circumvent the transient tag ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites Closes apache#27523 from zhengruifeng/avoid_tuple_assign_to_transient. Authored-by: zhengruifeng <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
it is said in LeastSquaresAggregator that :
I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241)
Why are the changes needed?
avoid tuple assignment because it will circumvent the transient tag
Does this PR introduce any user-facing change?
No
How was this patch tested?
existing testsuites