Skip to content

Set RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION = true #36

@lemire

Description

@lemire

Currently, Roaring bitmaps are only usable in Druid without run compression. We have RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION set to false. I'm all for conservatism, as nobody likes to introduce bugs in complex systems... but in this instance, I can see no practical reason to leave performance gains on the table.

We have demonstrated at length, on various forums and even in a peer reviewed paper that run compression makes Roaring generally better...

  • Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016. https://arxiv.org/abs/1603.06549

I realize that one might not want to enable new code paths lightly... but thanks to Gregory's work (and others), RoaringBitmap-0.6.27 has a high test coverage (90%). The run-compression approach is used in other important systems like Apache Kylin and Apache Spark with no bug report (so far... fingers' crossed).

The only problem I can imagine is if one creates a Druid database using the latest code and then tries to open it up with a Druid engine running on old code (e.g., using Roaring 0.4). The old code won't be able to read the run-optimized format... but even then, you will not get corruption and other random mayhem... you'll just get the Roaring library that complains about an unrecognized data format... and bails out. I stress that this would only happen to a user that creates the database using recent code, and then tries to operate it with very old code... That's not something that anyone should ever do in any case.

Anyhow, I think that good performance gains are left on the table with the current code.

This is somewhat related to this PR:
#35

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions