-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Here's a snippet from GET _nodes/stats for a simple example on my machine:
{
"enrich" : {
"type" : "enrich",
"stats" : {
"count" : 653312,
"time" : "29s",
"time_in_millis" : 29004,
"current" : 0,
"failed" : 0
}
}
},
That enrich processor is costing us 44 microseconds per document. But the dataset in question is actually super cache-able:
"cache_stats" : [
{
"node_id" : "uLluSH_HT8SthV3I9LQPrQ",
"count" : 100,
"hits" : 652965,
"misses" : 347,
"evictions" : 0
}
]
44 microseconds per document is way too high a performance price to paying for cache hits. So why's it so slow?
elasticsearch/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java
Lines 129 to 151 in 92ac03f
| private static class CacheKey { | |
| final String enrichIndex; | |
| final SearchRequest searchRequest; | |
| private CacheKey(String enrichIndex, SearchRequest searchRequest) { | |
| this.enrichIndex = enrichIndex; | |
| this.searchRequest = searchRequest; | |
| } | |
| @Override | |
| public boolean equals(Object o) { | |
| if (this == o) return true; | |
| if (o == null || getClass() != o.getClass()) return false; | |
| CacheKey cacheKey = (CacheKey) o; | |
| return enrichIndex.equals(cacheKey.enrichIndex) && searchRequest.equals(cacheKey.searchRequest); | |
| } | |
| @Override | |
| public int hashCode() { | |
| return Objects.hash(enrichIndex, searchRequest); | |
| } | |
| } |
We use the above class for the cache keys for potentially avoiding execution of SearchRequests (no need to execute a SearchRequest if we already know what it will return). But it's not really a great class for that.
First, SearchRequest is mutable. We don't happen to mutate it, but still, it would be better if we were using a properly immutable class in the key.
Second, SearchRequest isn't some trivial class, there's a deep object tree there, and so hashCode and equals actually has quite a bit of work to do (see the definition of hashCode/equals for SearchRequest then click through to source which is a SearchSourceBuilder and scope out its hashCode/equals, then rinse and repeat, also note that all those Objects.equals calls are an array allocation unless something JIT'ed it out).
Doubly-worse, our caching implementation must execute hashCode twice -- once to find the right underlying HashMap to talk to (arguably we control this, it's called viagetCacheSegment), and then a second time within the implementation of HashMap itself (we don't control this).
So, even with a better key, it would still probably be worth it to pre-calculate the hashCode eagerly at construction time and then just return it repeatedly, like how Snapshot already does the same:
elasticsearch/server/src/main/java/org/elasticsearch/snapshots/Snapshot.java
Lines 76 to 83 in 92ac03f
| @Override | |
| public int hashCode() { | |
| return hashCode; | |
| } | |
| private int computeHashCode() { | |
| return Objects.hash(repository, snapshotId); | |
| } |
Meta: I went with >enhancement for this, but it's pretty close to >bug territory in my mind.

