Skip to content

Memory efficient source filtering #25168

@amir20001

Description

@amir20001

Example:

Using Twitter as an example, each user is a document, and each tweet is a document nested under the user. For active users, each document can end up with thousands of tweets and thus a single document can be a few megabytes in size.

{
  "userId": "1",
  "tweets": [
    {
      "id": 1,
      "message": "tweet 1",
      
    },
    {
      "id": 2,
      "message": "tweet 2"
    },
   ...
  ]
}

Use Case:
We want to find users that have used a specific hashtag in their tweets and view only those tweets. We use source filtering and nested inner hit queries to get back just the users and matching tweets.

Problem:
Even though we are using source filtering, ElasticSearch will load the entire document into memory before doing source filtering. Since each record is so large, that means with any real throughput, we see constant garbage collection happening in the logs.

Feature Request:
Can you load filtered source in a more memory efficient manner - where you do not have to load the entire source into memory first?

Metadata

Metadata

Assignees

Labels

:Core/Infra/ScriptingScripting abstractions, Painless, and Mustache:Search/SearchSearch-related issues that do not fall into other categories>enhancementTeam:Core/InfraMeta label for core/infra teamTeam:SearchMeta label for search team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions