The EnrichCache has suboptimal performance on cache hits

Here's a snippet from `GET _nodes/stats` for a simple example on my machine:

```
             {
                "enrich" : {
                  "type" : "enrich",
                  "stats" : {
                    "count" : 653312,
                    "time" : "29s",
                    "time_in_millis" : 29004,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
```

That `enrich` processor is costing us 44 microseconds per document. But the dataset in question is actually super cache-able:

```
  "cache_stats" : [
    {
      "node_id" : "uLluSH_HT8SthV3I9LQPrQ",
      "count" : 100,
      "hits" : 652965,
      "misses" : 347,
      "evictions" : 0
    }
  ]
```

44 microseconds per document is way too high a performance price to paying for cache hits. **So why's it so slow?**

https://github.com/elastic/elasticsearch/blob/92ac03fcc5cdcf4ca611f34f96f21e048a378397/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java#L129-L151

We use the above class for the cache keys for potentially avoiding execution of `SearchRequest`s (no need to execute a `SearchRequest` if we already know what it will return). But it's not really a great class for that.

First, `SearchRequest` is mutable. We don't **happen** to mutate it, but still, it would be better if we were using a properly immutable class in the key.

Second, `SearchRequest` isn't some trivial class, there's a deep object tree there, and so `hashCode` and `equals` actually has quite a bit of work to do (see the definition of `hashCode`/`equals` for `SearchRequest` then click through to `source` which is a `SearchSourceBuilder` and scope out its `hashCode`/`equals`, then rinse and repeat, also note that all those `Objects.equals` calls are an array allocation unless something JIT'ed it out).

![Screen Shot 2023-05-11 at 4 55 47 PM](https://github.com/elastic/elasticsearch/assets/187034/ed8631d1-eeac-426a-b6a9-95bd76e0a150)

Doubly-worse, our caching implementation must execute `hashCode` twice -- once to find the right underlying `HashMap` to talk to (arguably we control this, it's called via`getCacheSegment`), and then a second time within the implementation of `HashMap` itself (we don't control this).

![Screen Shot 2023-05-11 at 4 44 20 PM](https://github.com/elastic/elasticsearch/assets/187034/15065bf5-3f07-4308-ae98-27a72edbb2f9)

So, even with a better key, it would still probably be worth it to pre-calculate the `hashCode` eagerly at construction time and then just return it repeatedly, like how `Snapshot` already does the same:

https://github.com/elastic/elasticsearch/blob/92ac03fcc5cdcf4ca611f34f96f21e048a378397/server/src/main/java/org/elasticsearch/snapshots/Snapshot.java#L76-L83

Meta: I went with `>enhancement` for this, but it's pretty close to `>bug` territory in my mind.

	private static class CacheKey {

	final String enrichIndex;
	final SearchRequest searchRequest;

	private CacheKey(String enrichIndex, SearchRequest searchRequest) {
	this.enrichIndex = enrichIndex;
	this.searchRequest = searchRequest;
	}

	@Override
	public boolean equals(Object o) {
	if (this == o) return true;
	if (o == null \|\| getClass() != o.getClass()) return false;
	CacheKey cacheKey = (CacheKey) o;
	return enrichIndex.equals(cacheKey.enrichIndex) && searchRequest.equals(cacheKey.searchRequest);
	}

	@Override
	public int hashCode() {
	return Objects.hash(enrichIndex, searchRequest);
	}
	}

	@Override
	public int hashCode() {
	return hashCode;
	}

	private int computeHashCode() {
	return Objects.hash(repository, snapshotId);
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The EnrichCache has suboptimal performance on cache hits #96050

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The EnrichCache has suboptimal performance on cache hits #96050

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions