-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Example:
Using Twitter as an example, each user is a document, and each tweet is a document nested under the user. For active users, each document can end up with thousands of tweets and thus a single document can be a few megabytes in size.
{
"userId": "1",
"tweets": [
{
"id": 1,
"message": "tweet 1",
},
{
"id": 2,
"message": "tweet 2"
},
...
]
}Use Case:
We want to find users that have used a specific hashtag in their tweets and view only those tweets. We use source filtering and nested inner hit queries to get back just the users and matching tweets.
Problem:
Even though we are using source filtering, ElasticSearch will load the entire document into memory before doing source filtering. Since each record is so large, that means with any real throughput, we see constant garbage collection happening in the logs.
Feature Request:
Can you load filtered source in a more memory efficient manner - where you do not have to load the entire source into memory first?