Skip to content

Conversation

@danielmitterdorfer
Copy link
Member

@danielmitterdorfer danielmitterdorfer commented Dec 16, 2016

With this commit, we introduce a cache to the geoip ingest processor.
The cache is enabled by default and caches the 1000 most recent items.
The cache size is controlled by the setting ingest.geoip.cache_size.

Closes #22074

With this commit, we introduce a cache to the geoip ingest processor.
The cache is disabled by default but can be enabled by setting
`ingest.geoip.cache_enabled` to `true`. The cache size is controlled
by the setting `ingest.geoip.cache_size`.

Closes elastic#22074
@danielmitterdorfer
Copy link
Member Author

@martijnvg Can you please review this one?


`ingest.geoip.cache_enabled`::

Whether to enable caching of results. Defaults to `false`.
Copy link
Contributor

@dadoonet dadoonet Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should cache by default to have a better OOTB experience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @martijnvg I disabled it by default but I'm happy to change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cache is only going to be beneficial if ip addresses re-appear often enough.
I'm not sure if that is the case in most scenarios? That is why I told @danielmitterdorfer that we should disable this cache by default. Otherwise this cache will hold entries that don't get used and eventually are kicked out.

If we think that this assumption if false then I'm ok with enabling this by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I looked at LS doc: https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html and it's a single setting, 1000 by default. So activated by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's align ES here with the behavior in Logstash. I'll enable it by default then (as per your suggestion with a single setting).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all right, then lets set it 1000 by default.

[[ingest-geoip-settings]]
===== Settings

The geoip processor supports the following settings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be say that it's "Node Settings"? May be some people will think that you can set that when you define the pipeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


The maximum number of results that should be cached. Defaults to `1000`.

Note that these settings apply to all geoip processors, i.e. there is one cache for all defined processors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one cache for all defined processors.

to

there is one cache for all defined geoip processors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++


Whether to enable caching of results. Defaults to `false`.

`ingest.geoip.cache_size`::
Copy link
Contributor

@dadoonet dadoonet Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also simply that and have one single setting actually. If ingest.geoip.cache_size == 0, then use a NoCache instance. Otherwise use the one you created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, neat idea.

@dadoonet
Copy link
Contributor

@danielmitterdorfer I left some comments.
I think we can avoid defining 2 settings actually. Only have the cache size should be enough.

WDYT?

@clintongormley
Copy link
Contributor

Why is this disabled by default? This is a plugin, so the user is only going to install it if they want to use it. I would think that they'd want it to be fast by default.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I agree with @dadoonet comments and also left an additional question.

throw new IllegalStateException("the geoip directory [" + geoIpConfigDirectory + "] containing databases doesn't exist");
}

NodeCache cache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now create a cache per database file. So by default we would have two cache with each upto 1000 entries and if a user configured more custom databases we would have more. So lets reuse the cache between files (create the cache in the method calling this method and supply it to here)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll change it before merging.

@danielmitterdorfer
Copy link
Member Author

@dadoonet, @martijnvg: I've added further commits that address your comments. Can you please check again?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielmitterdorfer danielmitterdorfer merged commit 655a95a into elastic:master Dec 19, 2016
@danielmitterdorfer
Copy link
Member Author

Thanks @martijnvg for your review! Merged.

danielmitterdorfer added a commit that referenced this pull request Dec 19, 2016
With this commit, we introduce a cache to the geoip ingest processor.
The cache is enabled by default and caches the 1000 most recent items.
The cache size is controlled by the setting `ingest.geoip.cache_size`.

Closes #22074
@clintongormley clintongormley added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed :Plugin Ingest GeoIp labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement v5.2.0 v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants