Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

it is said in LeastSquaresAggregator that :

// do not use tuple assignment above because it will circumvent the transient tag

I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241)

Why are the changes needed?

avoid tuple assignment because it will circumvent the transient tag

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

@zhengruifeng
Copy link
Contributor Author

I use following code to check this issue:
env: Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241)

import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream}
  
  def serialise(value: Any): Array[Byte] = {
    val stream: ByteArrayOutputStream = new ByteArrayOutputStream()
    val oos = new ObjectOutputStream(stream)
    oos.writeObject(value)
    oos.close()
    stream.toByteArray
  }

  def deserialise(bytes: Array[Byte]): Any = {
    val ois = new ObjectInputStream(new ByteArrayInputStream(bytes))
    val value = ois.readObject
    ois.close()
    value
  }



class A extends Serializable { @transient lazy val a = {println("get a"); System.currentTimeMillis} }
val a = new A
a.a
val a2 = deserialise(serialise(a)).asInstanceOf[A]
a2.a
a.a == a2.a

class B extends Serializable { @transient lazy val (a,b) = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)} }
val b = new B
b.a
val b2 = deserialise(serialise(b)).asInstanceOf[B]
b2.a
b.a == b2.a
b.b == b2.b

class C extends Serializable { @transient lazy val t = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)}; @transient lazy val a = t._1; @transient lazy val b = t._2 }
val c = new C
c.a
val c2 = deserialise(serialise(c)).asInstanceOf[C]
c2.a
c.a == c2.a
c.b == c2.b

Result:

scala> class A extends Serializable { @transient lazy val a = {println("get a"); System.currentTimeMillis} }
defined class A

scala> val a = new A
a: A = A@68ef01a5

scala> a.a
get a
res0: Long = 1581333143300

scala> val a2 = deserialise(serialise(a)).asInstanceOf[A]
a2: A = A@f017dd0

scala> a2.a
get a
res1: Long = 1581333143523

scala> a.a == a2.a
res2: Boolean = false

scala> 

scala> class B extends Serializable { @transient lazy val (a,b) = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)} }
defined class B

scala> val b = new B
b: B = B@1d008e61

scala> b.a
get a & b
res3: Long = 1581333144022

scala> val b2 = deserialise(serialise(b)).asInstanceOf[B]
b2: B = B@6ab826bb

scala> b2.a
res4: Long = 1581333144022

scala> b.a == b2.a
res5: Boolean = true

scala> b.b == b2.b
res6: Boolean = true

scala> 

scala> class C extends Serializable { @transient lazy val t = {println("get a & b"); val t = System.currentTimeMillis; (t, -t)}; @transient lazy val a = t._1; @transient lazy val b = t._2 }
defined class C

scala> val c = new C
c: C = C@7ec01440

scala> c.a
get a & b
res7: Long = 1581333144575

scala> val c2 = deserialise(serialise(c)).asInstanceOf[C]
c2: C = C@42a698bd

scala> c2.a
get a & b
res8: Long = 1581333144713

scala> c.a == c2.a
res9: Boolean = false

scala> c.b == c2.b
res10: Boolean = false

We can see that b2.a does not trigger the println, and the fields .a and .b contains the same values after serialization;
while a2.a and c2.a will trigger the println, and the fields change after serialization;

@zhengruifeng zhengruifeng changed the title avoid tuple assignment because it will circumvent the transient tag [SPARK-30772][ML][SQL] avoid tuple assignment because it will circumvent the transient tag Feb 10, 2020
@zhengruifeng
Copy link
Contributor Author

friendly ping @srowen , I can not find in the scala community any description or disscussin on this issue.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118156 has finished for PR 27523 at commit 477f99e.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. @sethah do you recall this? you added the comment in 1db1c65#diff-668c79317c51f40df870d3404d8a731fR921 several years ago.

So, if this isn't actually avoiding serialization, but still works, another option is to decide that's fine and just let them serialize. I don't know which of them are actually really big, but, that would keep existing behavior right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this pattern is a little ugly / adds extra overhead, but might be necessary vs just referencing "elementIndexVar._1" later in the code or something similar.

@sethah
Copy link
Contributor

sethah commented Feb 11, 2020

What's the question 😛 ? See my comments here: #14109

Also the databricks style guide mentions it: https://github.com/databricks/scala-style-guide#destructuring-binds

@srowen
Copy link
Member

srowen commented Feb 11, 2020

Oh ha thanks @sethah ! never seen that. Yeah I was just asking if you knew anything more about why or what is happening here. That's pretty good.

@sethah
Copy link
Contributor

sethah commented Feb 11, 2020

Heh np, I remember it being really subtle! But that's about all I can give you.

@zhengruifeng
Copy link
Contributor Author

I don't know which of them are actually really big, but, that would keep existing behavior right?

acturally, I do know how big those variables in SQL are. If they are supposed to be transient, I guess that should keep existing behavior.

@zhengruifeng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2020

Test build #118202 has finished for PR 27523 at commit 477f99e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 13, 2020

I'm OK with it, I guess I was just saying that the current behavior is not as intended, but still 'works' it seems. Maybe in some cases they aren't that big. But if you're pretty confident they are then this is a 'win'.

@srowen
Copy link
Member

srowen commented Feb 13, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Feb 13, 2020

Test build #118371 has finished for PR 27523 at commit 477f99e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

@srowen acturally, I do not know how big the objects in SQL are. So I agree to let them alone, and only change the ML side.

init

init

init

init

init
@zhengruifeng zhengruifeng force-pushed the avoid_tuple_assign_to_transient branch from 477f99e to 968ffb3 Compare February 14, 2020 08:13
@SparkQA
Copy link

SparkQA commented Feb 14, 2020

Test build #118407 has finished for PR 27523 at commit 968ffb3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 14, 2020

Oh, I don't think it's 'wrong' to fix. Or if we know the intended behavior isn't working, just undo the intention (i.e. remove transient lazy). Well, either way.

@srowen srowen closed this in 8ebbf85 Feb 16, 2020
@srowen
Copy link
Member

srowen commented Feb 16, 2020

Merged to master

@zhengruifeng zhengruifeng deleted the avoid_tuple_assign_to_transient branch February 17, 2020 03:00
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ent the transient tag

### What changes were proposed in this pull request?
it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that :

> // do not use tuple assignment above because it will circumvent the transient tag

I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241)

### Why are the changes needed?
avoid tuple assignment because it will circumvent the transient tag

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes apache#27523 from zhengruifeng/avoid_tuple_assign_to_transient.

Authored-by: zhengruifeng <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
…ent the transient tag

### What changes were proposed in this pull request?
it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that :

> // do not use tuple assignment above because it will circumvent the transient tag

I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241)

### Why are the changes needed?
avoid tuple assignment because it will circumvent the transient tag

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes apache#27523 from zhengruifeng/avoid_tuple_assign_to_transient.

Authored-by: zhengruifeng <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…ent the transient tag

### What changes were proposed in this pull request?
it is said in [LeastSquaresAggregator](https://github.com/apache/spark/blob/12e1bbaddbb2ef304b5880a62df6683fcc94ea54/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala#L188) that :

> // do not use tuple assignment above because it will circumvent the transient tag

I then check this issue with Scala 2.13.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241)

### Why are the changes needed?
avoid tuple assignment because it will circumvent the transient tag

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes apache#27523 from zhengruifeng/avoid_tuple_assign_to_transient.

Authored-by: zhengruifeng <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants