Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Nov 16, 2015

Fix the serialization of RoaringBitmap with Kyro serializer

This PR came from metamx#1, thanks to @drcrallen

@SparkQA
Copy link

SparkQA commented Nov 17, 2015

Test build #2067 has finished for PR 9748 at commit 9947bdc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class BitSet(numBits: Int) extends Serializable\n * class StreamingListener(object):\n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space before {.

@liancheng
Copy link
Contributor

Lacking background knowledge to sign off this one. Referenced discussions in #9243, metamx#1, SPARK-5949, and SPARK-11016. If understand it correctly, currently RoaringBitmap isn't a direct dependency of Spark, instead it's just a transitive dependency, and Kryo needs it to work properly?

@liancheng
Copy link
Contributor

Oh I didn't notice that #9243 was reverted. So RoaringBitmap is still a direct dependency, and before this PR, Kryo can't serialize it properly. Then this LGTM except for a few minor styling issues.

@srowen
Copy link
Member

srowen commented Nov 17, 2015

Yep, LGTM, thanks for going back and taking care of this loose end @davies

@SparkQA
Copy link

SparkQA commented Nov 17, 2015

Test build #46108 has finished for PR 9748 at commit 7fcbf66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 18, 2015
Fix the serialization of RoaringBitmap with Kyro serializer

This PR came from metamx#1, thanks to drcrallen

Author: Davies Liu <[email protected]>
Author: Charles Allen <[email protected]>

Closes #9748 from davies/SPARK-11016.

(cherry picked from commit bf25f9b)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in bf25f9b Nov 18, 2015
@scwf
Copy link
Contributor

scwf commented Dec 8, 2015

@davies here are some problems when deserialize RoaringBitmap. see the examples below:
run this piece of code

import com.esotericsoftware.kryo.io.{Input => KryoInput, Output => KryoOutput}
import java.io.DataInput

    class KryoInputDataInputBridge(input: KryoInput) extends DataInput {
      override def readLong(): Long = input.readLong()
      override def readChar(): Char = input.readChar()
      override def readFloat(): Float = input.readFloat()
      override def readByte(): Byte = input.readByte()
      override def readShort(): Short = input.readShort()
      override def readUTF(): String = input.readString() // readString in kryo does utf8
      override def readInt(): Int = input.readInt()
      override def readUnsignedShort(): Int = input.readShortUnsigned()
      override def skipBytes(n: Int): Int = input.skip(n.toLong).toInt
      override def readFully(b: Array[Byte]): Unit = input.read(b)
      override def readFully(b: Array[Byte], off: Int, len: Int): Unit = input.read(b, off, len)
      override def readLine(): String = throw new UnsupportedOperationException("readLine")
      override def readBoolean(): Boolean = input.readBoolean()
      override def readUnsignedByte(): Int = input.readByteUnsigned()
      override def readDouble(): Double = input.readDouble()
    }

    class KryoOutputDataOutputBridge(output: KryoOutput) extends DataOutput {
      override def writeFloat(v: Float): Unit = output.writeFloat(v)
      // There is no "readChars" counterpart, except maybe "readLine", which is not supported
      override def writeChars(s: String): Unit = throw new UnsupportedOperationException("writeChars")
      override def writeDouble(v: Double): Unit = output.writeDouble(v)
      override def writeUTF(s: String): Unit = output.writeString(s) // writeString in kryo does UTF8
      override def writeShort(v: Int): Unit = output.writeShort(v)
      override def writeInt(v: Int): Unit = output.writeInt(v)
      override def writeBoolean(v: Boolean): Unit = output.writeBoolean(v)
      override def write(b: Int): Unit = output.write(b)
      override def write(b: Array[Byte]): Unit = output.write(b)
      override def write(b: Array[Byte], off: Int, len: Int): Unit = output.write(b, off, len)
      override def writeBytes(s: String): Unit = output.writeString(s)
      override def writeChar(v: Int): Unit = output.writeChar(v.toChar)
      override def writeLong(v: Long): Unit = output.writeLong(v)
      override def writeByte(v: Int): Unit = output.writeByte(v)
    }
    val outStream = new FileOutputStream("D:\\wfserde")
    val output = new KryoOutput(outStream)
    val bitmap = new RoaringBitmap
    bitmap.add(1)
    bitmap.add(3)
    bitmap.add(5)
    bitmap.serialize(new KryoOutputDataOutputBridge(output))
    output.flush()
    output.close()

    val inStream = new FileInputStream("D:\\wfserde")
    val input = new KryoInput(inStream)
    val ret = new RoaringBitmap
    ret.deserialize(new KryoInputDataInputBridge(input))

    println(ret)

this will throw Buffer underflow error:

com.esotericsoftware.kryo.KryoException: Buffer underflow.
    at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
    at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
    at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
    at org.apache.spark.sql.SQLQuerySuite$$anonfun$6$KryoInputDataInputBridge$1.skipBytes

after same investigation, i found this is caused by a bug of kryo's Input.skip(long count)(EsotericSoftware/kryo#119) and we call this method in KryoInputDataInputBridge.

So i think we can fix this issue in this two ways:

  1. upgrade the kryo version to 2.23.0 or 2.24.0, which has fix this bug in kryo (i am not sure the upgrade is safe in spark, can you check it? @davies )

  2. we can bypass the kryo's Input.skip(long count) by directly call another skip method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of Input.skip(long count) in KryoInputDataInputBridge's skipBytes method:

   class KryoInputDataInputBridge(input: KryoInput) extends DataInput {
      ...
      override def skipBytes(n: Int): Int = {
        var remaining: Long = n
        while (remaining > 0) {
          val skip = Math.min(Integer.MAX_VALUE, remaining).asInstanceOf[Int]
          input.skip(skip)
          remaining -= skip
        }
        n
     }
      ...
    }

any advice?

@davies
Copy link
Contributor Author

davies commented Dec 8, 2015

@scwf For 1.6 release, it's risky to upgrade Kryo, the second approach will be better, could you send out a PR for it? thanks!

@scwf
Copy link
Contributor

scwf commented Dec 9, 2015

ok, should i send pr to master and branch-1.6 both?

@davies
Copy link
Contributor Author

davies commented Dec 9, 2015

Sending to master should be enough, will be pickled into 1.6.

asfgit pushed a commit that referenced this pull request Dec 9, 2015
…throw Buffer underflow exception

Jira: https://issues.apache.org/jira/browse/SPARK-12222

Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception:
```
com.esotericsoftware.kryo.KryoException: Buffer underflow.
	at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
```

This is caused by a bug of kryo's `Input.skip(long count)`(EsotericSoftware/kryo#119) and we call this method in `KryoInputDataInputBridge`.

Instead of upgrade kryo's version, this pr bypass the  kryo's `Input.skip(long count)` by directly call another `skip` method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of `Input.skip(long count)` in KryoInputDataInputBridge's `skipBytes` method.

more detail link to #9748 (comment)

Author: Fei Wang <[email protected]>

Closes #10213 from scwf/patch-1.

(cherry picked from commit 3934562)
Signed-off-by: Davies Liu <[email protected]>
asfgit pushed a commit that referenced this pull request Dec 9, 2015
…throw Buffer underflow exception

Jira: https://issues.apache.org/jira/browse/SPARK-12222

Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception:
```
com.esotericsoftware.kryo.KryoException: Buffer underflow.
	at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
```

This is caused by a bug of kryo's `Input.skip(long count)`(EsotericSoftware/kryo#119) and we call this method in `KryoInputDataInputBridge`.

Instead of upgrade kryo's version, this pr bypass the  kryo's `Input.skip(long count)` by directly call another `skip` method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of `Input.skip(long count)` in KryoInputDataInputBridge's `skipBytes` method.

more detail link to #9748 (comment)

Author: Fei Wang <[email protected]>

Closes #10213 from scwf/patch-1.
drcrallen pushed a commit to metamx/spark that referenced this pull request Dec 16, 2015
…throw Buffer underflow exception

Jira: https://issues.apache.org/jira/browse/SPARK-12222

Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception:
```
com.esotericsoftware.kryo.KryoException: Buffer underflow.
	at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
```

This is caused by a bug of kryo's `Input.skip(long count)`(EsotericSoftware/kryo#119) and we call this method in `KryoInputDataInputBridge`.

Instead of upgrade kryo's version, this pr bypass the  kryo's `Input.skip(long count)` by directly call another `skip` method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of `Input.skip(long count)` in KryoInputDataInputBridge's `skipBytes` method.

more detail link to apache#9748 (comment)

Author: Fei Wang <[email protected]>

Closes apache#10213 from scwf/patch-1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants