Skip to content

Conversation

@dimitris-athanasiou
Copy link
Contributor

While CustomProcessor is generic and allows for flexibility, there
are new requirements that make cross validation a concept it's hard
to abstract behind custom processor. In particular, we would like to
add data_counts to the DFA jobs stats. Counting training VS. test
docs would be a useful statistic. We would also want to add a
different cross validation strategy for multiclass classification.

This commit renames custom processors to cross validation splitters
which allows for those enhancements without cryptically doing
things as a side effect of the abstract custom processing.

While `CustomProcessor` is generic and allows for flexibility, there
are new requirements that make cross validation a concept it's hard
to abstract behind custom processor. In particular, we would like to
add data_counts to the DFA jobs stats. Counting training VS. test
docs would be a useful statistic. We would also want to add a
different cross validation strategy for multiclass classification.

This commit renames custom processors to cross validation splitters
which allows for those enhancements without cryptically doing
things as a side effect of the abstract custom processing.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@dimitris-athanasiou
Copy link
Contributor Author

run elasticsearch-ci/packaging-sample-matrix-unix

@dimitris-athanasiou
Copy link
Contributor Author

run elasticsearch-ci/2

@przemekwitek przemekwitek self-requested a review March 23, 2020 08:04
Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou dimitris-athanasiou merged commit adcf25e into elastic:master Mar 23, 2020
@dimitris-athanasiou dimitris-athanasiou deleted the refactor-dfa-custom-processor-to-cross-validation-splitter branch March 23, 2020 08:37
dimitris-athanasiou added a commit that referenced this pull request Mar 23, 2020
…53915) (#53956)

While `CustomProcessor` is generic and allows for flexibility, there
are new requirements that make cross validation a concept it's hard
to abstract behind custom processor. In particular, we would like to
add data_counts to the DFA jobs stats. Counting training VS. test
docs would be a useful statistic. We would also want to add a
different cross validation strategy for multiclass classification.

This commit renames custom processors to cross validation splitters
which allows for those enhancements without cryptically doing
things as a side effect of the abstract custom processing.

Backport of #53915
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants