Skip to content

Conversation

@davidkyle
Copy link
Member

Mappings for job and datafeed configurations and the index template.
For moving the configuration classes out of the clusterstate

@davidkyle davidkyle added >feature review :ml Machine learning labels Aug 2, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments and questions.

* is stored
* @return The index name
*/
public static String jobConfigIndexName() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just configIndexName(), as it will also store datafeed configs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.field(TYPE, KEYWORD)
.endObject()
.startObject(Detector.CUSTOM_RULES_FIELD.getPreferredName())
.field(TYPE, NESTED)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the nested document functionality here? If not then OBJECT would be lighter-weight. (Maybe at this level NESTED is useful - I'm not sure.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom rules are an array which is why I used nested

.field(ENABLED, false)
.endObject()
.startObject(DetectionRule.CONDITIONS_FIELD.getPreferredName())
.field(TYPE, NESTED)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this level NESTED is definitely overkill and should be OBJECT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again conditions are an array. It's arguable that the extensive mappings are overkill anyway, not all fields need to be searchable and these mappings come with a maintenance burden. Is it conceivable that someone will want to search for a job where RuleCondition.AppliesTo == ACTUAL && RuleCondition.Operator == GT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there will be relatively few documents in the index, so we can get away with high-cost mappings. We have NESTED in our results index where it is much more costly due to the higher document volume and it's never been found to be the root cause of any problem, so it's probably fine.

.endObject()
.endObject()
.startObject(Detector.DETECTOR_INDEX.getPreferredName())
.field(TYPE, LONG)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is INTEGER in other mappings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

public static boolean isValidFieldName(String fieldName) {
String[] segments = DOT_PATTERN.split(fieldName);
return !RESERVED_FIELD_NAMES.contains(segments[0]);
return RESERVED_RESULT_FIELD_NAMES.contains(segments[0]) == false && RESERVED_CONFIG_FIELD_NAMES.contains(segments[0]) == false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for the config field names to be included here? It will prevent fields being added to results documents just because they clash with a config field name. But that's not necessary because the config and results are in different indices. Is there another reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

.startObject(Job.RESULTS_INDEX_NAME.getPreferredName())
.field(TYPE, KEYWORD)
.endObject()
.startObject(Job.DELETED.getPreferredName()) // TODO can this be removed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question. At the moment this is what other APIs use to determine that a job is in the process of being deleted. Maybe storing that flag in an index won't be sufficiently atomic in the future. An alternative might be to make the job deletion process a persistent task, and use the existence of that persistent task to determine whether a job is being deleted. This field is probably just one of several places where indices won't give the same ordering guarantee that cluster state gave us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the loose eventually consistent guarantees this field is useless, we have considered other ways to mark the fact that a job is in the process of deletion I'd like to remove this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the DELETED field from the mapping

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :ml Machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants