Skip to content

The EnrichCache has suboptimal performance on cache hits #96050

@joegallo

Description

@joegallo

Here's a snippet from GET _nodes/stats for a simple example on my machine:

             {
                "enrich" : {
                  "type" : "enrich",
                  "stats" : {
                    "count" : 653312,
                    "time" : "29s",
                    "time_in_millis" : 29004,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },

That enrich processor is costing us 44 microseconds per document. But the dataset in question is actually super cache-able:

  "cache_stats" : [
    {
      "node_id" : "uLluSH_HT8SthV3I9LQPrQ",
      "count" : 100,
      "hits" : 652965,
      "misses" : 347,
      "evictions" : 0
    }
  ]

44 microseconds per document is way too high a performance price to paying for cache hits. So why's it so slow?

private static class CacheKey {
final String enrichIndex;
final SearchRequest searchRequest;
private CacheKey(String enrichIndex, SearchRequest searchRequest) {
this.enrichIndex = enrichIndex;
this.searchRequest = searchRequest;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
CacheKey cacheKey = (CacheKey) o;
return enrichIndex.equals(cacheKey.enrichIndex) && searchRequest.equals(cacheKey.searchRequest);
}
@Override
public int hashCode() {
return Objects.hash(enrichIndex, searchRequest);
}
}

We use the above class for the cache keys for potentially avoiding execution of SearchRequests (no need to execute a SearchRequest if we already know what it will return). But it's not really a great class for that.

First, SearchRequest is mutable. We don't happen to mutate it, but still, it would be better if we were using a properly immutable class in the key.

Second, SearchRequest isn't some trivial class, there's a deep object tree there, and so hashCode and equals actually has quite a bit of work to do (see the definition of hashCode/equals for SearchRequest then click through to source which is a SearchSourceBuilder and scope out its hashCode/equals, then rinse and repeat, also note that all those Objects.equals calls are an array allocation unless something JIT'ed it out).

Screen Shot 2023-05-11 at 4 55 47 PM

Doubly-worse, our caching implementation must execute hashCode twice -- once to find the right underlying HashMap to talk to (arguably we control this, it's called viagetCacheSegment), and then a second time within the implementation of HashMap itself (we don't control this).

Screen Shot 2023-05-11 at 4 44 20 PM

So, even with a better key, it would still probably be worth it to pre-calculate the hashCode eagerly at construction time and then just return it repeatedly, like how Snapshot already does the same:

@Override
public int hashCode() {
return hashCode;
}
private int computeHashCode() {
return Objects.hash(repository, snapshotId);
}

Meta: I went with >enhancement for this, but it's pretty close to >bug territory in my mind.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions