-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Design discussion for ML dataframe builder https://github.com/elastic/elasticsearch/tree/feature/fib
Work in progress.
Intro
A dataframe builder job uses persistent task that can be created/deleted/started/stopped. This issue discusses ways to store the configuration of a dataframe builder job.
Configuration payload
A configuration object is supposed to be small, it contains fields like source and destination index and a list of aggregation configurations (which makes parsing tightly coupled to aggregations). In future there will be likely more options but it will remain relatively small.
Problem
The biggest risk is breaking backwards compatibility and/or creating a road block for deprecating aggregations.
Solutions
A Do nothing - use cluster state
Use persistent task to store the configuration, which means the config is stored in the cluster state and needs to be read back on cluster start.
pro:
- simplest
con:
- if the configuration becomes invalid, parsing breaks, node does not start
B Careful Parsing
Same as A, but wrap the parsing code with error handling to not cause total failure.
pro:
- does not fail if configuration gets invalid
C deferred parsing
Store the config in a blob to avoid parsing at startup and parse it on start of the persistent job
pro:
- does not fail if configuration gets invalid
- does not slow down startup
con
- ugly blob in cluster state
D Store configuration in index, only keep ID's in cluster state
Store only the (unique) job id in the cluster state and store the configs in a separate private index
pro:
- does not fail if configuration gets invalid
- does not slow down startup
- does not pollute cluster state (size)
con:
- extra logic, error handling