Skip to content

Should we store geoip files uncompressed? #28782

@danielmitterdorfer

Description

@danielmitterdorfer

Current Situation

The ingest-geoip plugin ships with three geoip files:

  • GeoLite2-ASN.mmdb.gz
  • GeoLite2-City.mmdb.gz
  • GeoLite2-Country.mmdb.gz

We load these files lazily to reduce memory usage when this feature is not required (despite the plugin being loaded).

While the Maxmind DB reader allows to load data either on-heap or off-heap, we basically have to load data on-heap because we provide an InputStream that decompresses the gzipped data on the fly.

In order to allow loading the data off-heap, we need to provide a file (see the builder.mode parameter which controls whether to load on- or off-heap).

Discussion Item

Does it make sense to store these files uncompressed instead of gzip-compressed?

Consequences

Positive Consequences

  • We take up less heap memory (it would be roughly 70MB less). This will positively affect users - especially users with small clusters - as well as our own integration tests (we can probably reduce the heap size of our integration test clusters). As a corollary, I expect that we reduce GC pressure a bit as well.
  • We also have the choice to decide whether we load data on-heap or off-heap when we use a file instead of an input stream. This means that we can provide an (expert) setting if we want to if certain users still prefer to have the data on-heap.

Negative Consequences

  • Loading the files is not free: The data are memory-mapped instead and take up native memory.
  • The size of the config directory on disk increases from 34.5 MB to 72.6 MB.
  • The size of the plugin artefact increases from 34.7 MB to 37.6 MB (tested with ingest-geoip-6.2.2.zip).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions