-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML-Dataframe] add basic configuration #33813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-Dataframe] add basic configuration #33813
Conversation
|
Pinging @elastic/ml-core |
2a6d7ee to
c17d138
Compare
| if (taskOperationFailures.isEmpty() == false) { | ||
| throw org.elasticsearch.ExceptionsHelper | ||
| .convertToElastic(taskOperationFailures.get(0).getCause()); | ||
| throw org.elasticsearch.ExceptionsHelper.convertToElastic(taskOperationFailures.get(0).getCause()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could import org.elasticsearch.ExceptionsHelper and then you wouldn't need the org.elasticsearch. here or 2 lines below.
| private final SourceConfig sourceConfig; | ||
| private final AggregationConfig aggregationConfig; | ||
|
|
||
| private static final ConstructingObjectParser<FeatureIndexBuilderJobConfig, String> PARSER = new ConstructingObjectParser<>(NAME, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experience has shown that for stored configs it's best to have both a lenient parser that ignores unknown fields and a strict parser that throws an error on encountering an unknown field. See for example AnalysisConfig.LENIENT_PARSER and AnalysisConfig.STRICT_PARSER. Then use the strict parser when parsing user-supplied requests and the lenient parser when parsing configs retrieved from storage. That way things don't completely break in a mixed version cluster when the newer version has added an extra field to the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| this.sources = new ArrayList<>(num); | ||
| for (int i = 0; i < num; i++) { | ||
| CompositeValuesSourceBuilder<?> builder = CompositeValuesSourceParserHelper.readFrom(in); | ||
| getSources().add(builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this class was immutable, and this line shows why it isn't currently. This is how to make it immutable:
List<CompositeValuesSourceBuilder<?>> sources = new ArrayList<>(num);
for (int i = 0; i < num; i++) {
CompositeValuesSourceBuilder<?> builder = CompositeValuesSourceParserHelper.readFrom(in);
sources.add(builder);
}
this.sources = Collections.unmodifableList(sources);
| } | ||
|
|
||
| public SourceConfig(List<CompositeValuesSourceBuilder<?>> sources) { | ||
| this.sources = sources; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For immutability:
this.sources = Collections.unmodifiableList(new ArrayList<>(sources));
|
|
||
| @Before | ||
| public void setUp() throws Exception { | ||
| super.setUp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unusual to override the base class @Before method and then call the base class version first. JUnit will by default call all the base class @Before methods before the derived class @Before methods, so it would be more usual to just give the derived class @Before method a different name and let JUnit call the two methods in the standard order, avoiding the need to call super.setUp() here.
| public static AggregationConfig randonAggregationConfig() { | ||
| AggregatorFactories.Builder builder = new AggregatorFactories.Builder(); | ||
|
|
||
| for (int i = 1; i < randomIntBetween(1, 20); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid for the number of aggregator factories to be 0? If not maybe the AggregationConfig class itself should validate that it has at least 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, should be for (int i = 0 ...
Nevertheless, AggregationConfig misses validation, will be part of an upcoming PR.
| public class SourceConfigTests extends AbstractSerializingFeatureIndexBuilderTestCase<SourceConfig> { | ||
|
|
||
| public static SourceConfig randomSourceConfig() { | ||
| int numSources = randomIntBetween(1, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid for the number of sources to be 0? If not maybe the SourceConfig class itself should validate that it has at least 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the for-loop below counts from 0, so there will be at least 1 source for this test.
SourceConfig lacks validation at the moment, to be addressed in an upcoming PR.
| @AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/33942") | ||
| public class AggregationConfigTests extends AbstractSerializingFeatureIndexBuilderTestCase<AggregationConfig> { | ||
|
|
||
| public static AggregationConfig randonAggregationConfig() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: randon -> random
| indexPattern = in.readString(); | ||
| destinationIndex = in.readString(); | ||
| sourceConfig = in.readOptionalWriteable(SourceConfig::new); | ||
| aggregationConfig = in.readOptionalWriteable(AggregationConfig::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are unnecessarily many levels where null is allowed. You're allowing aggregationConfig to be null here, but also in AggregationConfig aggregatorFactoryBuilder is allowed to be null. I think at most one of these possibilities should be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in AggregationConfig - not aiming for at the moment but in future I see the possibility of this builder to not use aggregations.
| } | ||
|
|
||
| AggregationConfig(final StreamInput in) throws IOException { | ||
| aggregatorFactoryBuilder = in.readOptionalWriteable(AggregatorFactories.Builder::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there are unnecessarily many levels where null is allowed. You're allowing aggregatorFactoryBuilder to be null here, but also in FeatureIndexBuilderJobConfig aggregationConfig is allowed to be null. I think at most one of these possibilities should be allowed.
|
@droberts195 Thanks for the review, addressed most of your comments. For the rest I am planning to tackle them in separate PR's - too many open tasks at the moment. |
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool, since it's only going to a feature branch I'm happy to leave everything else for future PRs
FEATURE BRANCH PR
add basic configuration for feature index builder job, reads the source, destination index, aggregation source and aggregation config
Note:
The AggregationConfig awaits a fix upstream, see #33942