-
Couldn't load subscription status.
- Fork 14
Description
Currently, Roaring bitmaps are only usable in Druid without run compression. We have RoaringBitmapFactory.DEFAULT_COMPRESS_RUN_ON_SERIALIZATION set to false. I'm all for conservatism, as nobody likes to introduce bugs in complex systems... but in this instance, I can see no practical reason to leave performance gains on the table.
We have demonstrated at length, on various forums and even in a peer reviewed paper that run compression makes Roaring generally better...
- Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016. https://arxiv.org/abs/1603.06549
I realize that one might not want to enable new code paths lightly... but thanks to Gregory's work (and others), RoaringBitmap-0.6.27 has a high test coverage (90%). The run-compression approach is used in other important systems like Apache Kylin and Apache Spark with no bug report (so far... fingers' crossed).
The only problem I can imagine is if one creates a Druid database using the latest code and then tries to open it up with a Druid engine running on old code (e.g., using Roaring 0.4). The old code won't be able to read the run-optimized format... but even then, you will not get corruption and other random mayhem... you'll just get the Roaring library that complains about an unrecognized data format... and bails out. I stress that this would only happen to a user that creates the database using recent code, and then tries to operate it with very old code... That's not something that anyone should ever do in any case.
Anyhow, I think that good performance gains are left on the table with the current code.
This is somewhat related to this PR:
#35