Skip to content

Conversation

@LantaoJin
Copy link
Contributor

What changes were proposed in this pull request?

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

This is an update of #24157

How was this patch tested?

Add a UT

@LantaoJin
Copy link
Contributor Author

@srowen @squito Could you help to review this PR? I can not reopen #24157 since a force pushing. Thanks for your time.


conf.set(KRYO_USE_UNSAFE, false)
val safeSer = new KryoSerializer(conf).newInstance()
var actual : RoaringBitmap = safeSer.deserialize(safeSer.serialize(expected))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think the style checker will flag the space before the colon.

}

test("SPARK-27216: Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue") {
val expected = new RoaringBitmap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: () after the constructor

@SparkQA
Copy link

SparkQA commented Apr 1, 2019

Test build #4675 has finished for PR 24264 at commit 076411b.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LantaoJin LantaoJin changed the title [SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 [SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue Apr 1, 2019
@LantaoJin
Copy link
Contributor Author

Rename for a better searching or tracing.

super.afterAll()
}

test("SPARK-27216: Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue") {
Copy link
Contributor

@attilapiros attilapiros Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move your test into KryoSerializerSuite where conf.set(KRYO_USE_UNSAFE, false) is given.
And as UnsafeKryoSerializerSuite is inherited from KryoSerializerSuite beside using conf.set(KRYO_USE_UNSAFE, true) in beforeAll() it will inherit your test method so both the safe/unsafe case will be tested this way: once executing KryoSerializerSuite and once UnsafeKryoSerializerSuite.

Summary: you do not need to test both cases in one test method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks for this reminding.

@srowen
Copy link
Member

srowen commented Apr 1, 2019

Run ./dev/test-dependencies.sh --replace-manifest

val expected = new RoaringBitmap()
expected.add(1787)

conf.set(KRYO_USE_UNSAFE, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think this is what @attilapiros meant. I think he meant to (a) leave UnsafeKryoSerializerSuite as it was before and (b) add a new test which just tested serialization of RoaringBitmap with kryo in this class, but didn't tweak confs at all. That way you'd get a test for serializing with kryo.unsafe=false (here in KryoSerializerSuite) and you'd also automatically get a test with kryo.unsafe=true (inherited in UnsafeKryoSerializerSuite).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I was about to write a similar comment.

@LantaoJin
Copy link
Contributor Author

LantaoJin commented Apr 2, 2019

Uh... @attilapiros Is it proper now?

@attilapiros
Copy link
Contributor

@LantaoJin No.

  1. revert the deletion of the original UnsafeKryoSerializerSuite.scala
  2. remove these lines:
    class UnsafeKryoSerializerSuite extends SparkFunSuite with SharedSparkContext {
    conf.set(SERIALIZER, classOf[KryoSerializer].getName)
    conf.set(KRYO_USER_REGISTRATORS, classOf[MyRegistrator].getName)
    conf.set(KRYO_USE_UNSAFE, true)
    test("test RoaringBitmap ser/dser with Kryo unsafe enabled") {
    val expected = new RoaringBitmap()
    expected.add(1787)
    val unsafeSer = new KryoSerializer(conf).newInstance()
    val actual: RoaringBitmap = unsafeSer.deserialize(unsafeSer.serialize(expected))
    assert(actual === expected)
    }
    }

@attilapiros
Copy link
Contributor

Finally via sbt test both UnsafeKryoSerializerSuite and KryoSerializerSuite and you should see your RoaringBitmap test executed twice, you can do that like this:

$ ./build/sbt
> project core
> testOnly *.UnsafeKryoSerializerSuite
> testOnly *.KryoSerializerSuite

@attilapiros
Copy link
Contributor

You can revert the deletion by:

git checkout f799e34962c24bb55662e7e570195d8f5aa5799a core/src/test/scala/org/apache/spark/serializer/UnsafeKryoSerializerSuite.scala

But please double check the commit ID (my laptop used for developing Spark runs an OS update and this is my other machine...).

@LantaoJin
Copy link
Contributor Author

Finally via sbt test both UnsafeKryoSerializerSuite and KryoSerializerSuite and you should see your RoaringBitmap test executed twice, you can do that like this:

$ ./build/sbt
> project core
> testOnly *.UnsafeKryoSerializerSuite
> testOnly *.KryoSerializerSuite

Everything looks fine from my side.
> testOnly *.UnsafeKryoSerializerSuite
[info] UnsafeKryoSerializerSuite:
[info] - test RoaringBitmap ser/dser with Kryo unsafe enabled (264 milliseconds)
[info] ScalaTest
[info] Run completed in 4 seconds, 430 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 97 s, completed Apr 2, 2019 5:32:05 PM
> testOnly *.KryoSerializerSuite
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Packaging /Users/lajin/git/my/spark/core/target/scala-2.12/spark-core_2.12-3.0.0-SNAPSHOT.jar ...
[info] Done packaging.
[info] KryoSerializerSuite:
[info] - SPARK-7392 configuration limits (22 milliseconds)
....
[info] - test RoaringBitmap ser/dser with Kryo unsafe disabled (6 milliseconds)
[info] ScalaTest
[info] Run completed in 5 seconds, 503 milliseconds.
[info] Total number of tests run: 34
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 34, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 34, Failed 0, Errors 0, Passed 34

@LantaoJin
Copy link
Contributor Author

@attilapiros Oh, I read your comment again. I guess I have understood it. I will refactor it.

@LantaoJin
Copy link
Contributor Author

@attilapiros What about it now?

@attilapiros
Copy link
Contributor

@LantaoJin Now your change is almost perfect.

But as there was no need to import RoaringBitmap for your test it made me think what kind of tests were here before your PR regarding this class.

I have found this:

test("SPARK-12222: deserialize RoaringBitmap throw Buffer underflow exception") {
withTempDir { dir =>
val tmpfile = dir.toString + "/RoaringBitmap"
val outStream = new FileOutputStream(tmpfile)
val output = new KryoOutput(outStream)
val bitmap = new RoaringBitmap
bitmap.add(1)
bitmap.add(3)
bitmap.add(5)
// Ignore Kryo because it doesn't use writeObject
bitmap.serialize(new KryoOutputObjectOutputBridge(null, output))
output.flush()
output.close()
val inStream = new FileInputStream(tmpfile)
val input = new KryoInput(inStream)
val ret = new RoaringBitmap
// Ignore Kryo because it doesn't use readObject
ret.deserialize(new KryoInputObjectInputBridge(null, input))
input.close()
assert(ret == bitmap)
}
}

I think it is out-dated and this test can be removed as the statement deserialize RoaringBitmap throw Buffer underflow exception is false, what do you think?

@LantaoJin
Copy link
Contributor Author

I think it's out-dated. But it needs double check if we plan to merge this PR to multiple branches.

@attilapiros
Copy link
Contributor

As this change contradicts with that statement it is safe to remove that test.

@LantaoJin
Copy link
Contributor Author

I've checked branch 2.1 and above. The kryo versions in them are all 3.0 or above. According to the description in SPARK-12222, it should be safe. I will remove it.

@attilapiros
Copy link
Contributor

Jenkins retest this please.

@squito
Copy link
Contributor

squito commented Apr 2, 2019

lgtm assuming tests pass. Thanks for the work on this @LantaoJin & @attilapiros

@SparkQA
Copy link

SparkQA commented Apr 2, 2019

Test build #4679 has finished for PR 24264 at commit 3397957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lemire
Copy link
Contributor

lemire commented Apr 3, 2019

You might want to consider upgrading directly to RoaringBitmap 0.8.0. We added serialize(ByteBuffer) and deserialize(ByteBuffer) methods that are quite interesting (and fast).

cc @richardstartin @blacelle

@squito
Copy link
Contributor

squito commented Apr 4, 2019

@lemire I think that's a good idea for something to do eventually, but I think that can be handled separately. IIUC, to take advantage of those performance improvements, we need to change how we're serializing the RoaringBitmaps inside spark. This is a bug fix we can get in now.

I filed https://issues.apache.org/jira/browse/SPARK-27367 for this if anybody wants to follow up with that

@squito
Copy link
Contributor

squito commented Apr 4, 2019

I merged this to master, but there were merge conflicts going back farther. @LantaoJin would you like to open another pr, against branch 2.4 at least?

@asfgit asfgit closed this in 69dd44a Apr 4, 2019
srowen pushed a commit that referenced this pull request Apr 4, 2019
…fix Kryo unsafe ser/dser issue

## What changes were proposed in this pull request?

Back-port of #24264 to branch-2.4.

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

## How was this patch tested?

Add a UT

Closes #24290 from LantaoJin/SPARK-27216_BACKPORT-2.4.

Authored-by: LantaoJin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
srowen pushed a commit that referenced this pull request Apr 4, 2019
…fix Kryo unsafe ser/dser issue

## What changes were proposed in this pull request?

Back-port of #24264 to branch-2.3.

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

## How was this patch tested?

Add a UT

Closes #24291 from LantaoJin/SPARK-27216_BACKPORT-2.3.

Authored-by: LantaoJin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…fix Kryo unsafe ser/dser issue

## What changes were proposed in this pull request?

Back-port of apache#24264 to branch-2.4.

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

## How was this patch tested?

Add a UT

Closes apache#24290 from LantaoJin/SPARK-27216_BACKPORT-2.4.

Authored-by: LantaoJin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…fix Kryo unsafe ser/dser issue

## What changes were proposed in this pull request?

Back-port of apache#24264 to branch-2.4.

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

## How was this patch tested?

Add a UT

Closes apache#24290 from LantaoJin/SPARK-27216_BACKPORT-2.4.

Authored-by: LantaoJin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…fix Kryo unsafe ser/dser issue

## What changes were proposed in this pull request?

Back-port of apache#24264 to branch-2.4.

HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap couldn't be ser/deser with unsafe KryoSerializer.

It's a bug of RoaringBitmap-0.5.11 and fixed in latest version.

## How was this patch tested?

Add a UT

Closes apache#24290 from LantaoJin/SPARK-27216_BACKPORT-2.4.

Authored-by: LantaoJin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants