[ML-Dataframe] Job configuration storage design discussion for dataframe builder

Design discussion for ML dataframe builder https://github.com/elastic/elasticsearch/tree/feature/fib

Work in progress.

### Intro

A dataframe builder job uses persistent task that can be created/deleted/started/stopped. This issue discusses ways to store the configuration of a dataframe builder job.

#### Configuration payload

A configuration object is supposed to be small, it contains fields like source and destination index and a list of aggregation configurations (which makes parsing tightly coupled to aggregations). In future there will be likely more options but it will remain relatively small.

#### Problem

The biggest risk is breaking backwards compatibility and/or creating a road block for deprecating aggregations.

#### Solutions

##### A Do nothing - use cluster state

Use persistent task to store the configuration, which means the config is stored in the cluster state and needs to be read back on cluster start.

*pro*:
- simplest

*con*:
- if the configuration becomes invalid, parsing breaks, node does not start

##### B Careful Parsing

Same as A, but wrap the parsing code with error handling to not cause total failure.

*pro*:
- does not fail if configuration gets invalid

##### C deferred parsing 

Store the config in a blob to avoid parsing at startup and parse it on start of the persistent job

*pro*:
- does not fail if configuration gets invalid
- does not slow down startup

*con*
- ugly blob in cluster state

##### D Store configuration in index, only keep ID's in cluster state

Store only the  (unique) job id in the cluster state and store the configs in a separate private index

*pro*:
- does not fail if configuration gets invalid
- does not slow down startup
- does not pollute cluster state (size)

*con*:
- extra logic, error handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML-Dataframe] Job configuration storage design discussion for dataframe builder #33952

Intro

Configuration payload

Problem

Solutions

A Do nothing - use cluster state

B Careful Parsing

C deferred parsing

D Store configuration in index, only keep ID's in cluster state

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML-Dataframe] Job configuration storage design discussion for dataframe builder #33952

Description

Intro

Configuration payload

Problem

Solutions

A Do nothing - use cluster state

B Careful Parsing

C deferred parsing

D Store configuration in index, only keep ID's in cluster state

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions