-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Cache results of geoip lookups #22231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache results of geoip lookups #22231
Conversation
With this commit, we introduce a cache to the geoip ingest processor. The cache is disabled by default but can be enabled by setting `ingest.geoip.cache_enabled` to `true`. The cache size is controlled by the setting `ingest.geoip.cache_size`. Closes elastic#22074
|
@martijnvg Can you please review this one? |
docs/plugins/ingest-geoip.asciidoc
Outdated
|
|
||
| `ingest.geoip.cache_enabled`:: | ||
|
|
||
| Whether to enable caching of results. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should cache by default to have a better OOTB experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @martijnvg I disabled it by default but I'm happy to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this cache is only going to be beneficial if ip addresses re-appear often enough.
I'm not sure if that is the case in most scenarios? That is why I told @danielmitterdorfer that we should disable this cache by default. Otherwise this cache will hold entries that don't get used and eventually are kicked out.
If we think that this assumption if false then I'm ok with enabling this by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I looked at LS doc: https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html and it's a single setting, 1000 by default. So activated by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then let's align ES here with the behavior in Logstash. I'll enable it by default then (as per your suggestion with a single setting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all right, then lets set it 1000 by default.
docs/plugins/ingest-geoip.asciidoc
Outdated
| [[ingest-geoip-settings]] | ||
| ===== Settings | ||
|
|
||
| The geoip processor supports the following settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be say that it's "Node Settings"? May be some people will think that you can set that when you define the pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
docs/plugins/ingest-geoip.asciidoc
Outdated
|
|
||
| The maximum number of results that should be cached. Defaults to `1000`. | ||
|
|
||
| Note that these settings apply to all geoip processors, i.e. there is one cache for all defined processors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is one cache for all defined processors.
to
there is one cache for all defined geoip processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
|
|
||
| Whether to enable caching of results. Defaults to `false`. | ||
|
|
||
| `ingest.geoip.cache_size`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also simply that and have one single setting actually. If ingest.geoip.cache_size == 0, then use a NoCache instance. Otherwise use the one you created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, neat idea.
|
@danielmitterdorfer I left some comments. WDYT? |
|
Why is this disabled by default? This is a plugin, so the user is only going to install it if they want to use it. I would think that they'd want it to be fast by default. |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I agree with @dadoonet comments and also left an additional question.
| throw new IllegalStateException("the geoip directory [" + geoIpConfigDirectory + "] containing databases doesn't exist"); | ||
| } | ||
|
|
||
| NodeCache cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now create a cache per database file. So by default we would have two cache with each upto 1000 entries and if a user configured more custom databases we would have more. So lets reuse the cache between files (create the cache in the method calling this method and supply it to here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll change it before merging.
|
@dadoonet, @martijnvg: I've added further commits that address your comments. Can you please check again? |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @martijnvg for your review! Merged. |
With this commit, we introduce a cache to the geoip ingest processor. The cache is enabled by default and caches the 1000 most recent items. The cache size is controlled by the setting `ingest.geoip.cache_size`. Closes #22074
With this commit, we introduce a cache to the geoip ingest processor.
The cache is enabled by default and caches the 1000 most recent items.
The cache size is controlled by the setting
ingest.geoip.cache_size.Closes #22074