Skip to content

Conversation

@hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Sep 18, 2018

FEATURE BRANCH PR

add basic configuration for feature index builder job, reads the source, destination index, aggregation source and aggregation config

Note:

The AggregationConfig awaits a fix upstream, see #33942

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@hendrikmuhs hendrikmuhs changed the title [ML-FIB] add basic configuration [ML-Dataframe] add basic configuration Sep 19, 2018
@hendrikmuhs hendrikmuhs force-pushed the feature/fib-basic-configuration branch from 2a6d7ee to c17d138 Compare September 25, 2018 09:14
if (taskOperationFailures.isEmpty() == false) {
throw org.elasticsearch.ExceptionsHelper
.convertToElastic(taskOperationFailures.get(0).getCause());
throw org.elasticsearch.ExceptionsHelper.convertToElastic(taskOperationFailures.get(0).getCause());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could import org.elasticsearch.ExceptionsHelper and then you wouldn't need the org.elasticsearch. here or 2 lines below.

private final SourceConfig sourceConfig;
private final AggregationConfig aggregationConfig;

private static final ConstructingObjectParser<FeatureIndexBuilderJobConfig, String> PARSER = new ConstructingObjectParser<>(NAME, false,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experience has shown that for stored configs it's best to have both a lenient parser that ignores unknown fields and a strict parser that throws an error on encountering an unknown field. See for example AnalysisConfig.LENIENT_PARSER and AnalysisConfig.STRICT_PARSER. Then use the strict parser when parsing user-supplied requests and the lenient parser when parsing configs retrieved from storage. That way things don't completely break in a mixed version cluster when the newer version has added an extra field to the config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a note int the meta issue (#33600), as we will highly likely move out of the cluster state (#33952) the config needs a full revisit

this.sources = new ArrayList<>(num);
for (int i = 0; i < num; i++) {
CompositeValuesSourceBuilder<?> builder = CompositeValuesSourceParserHelper.readFrom(in);
getSources().add(builder);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this class was immutable, and this line shows why it isn't currently. This is how to make it immutable:

        List<CompositeValuesSourceBuilder<?>> sources = new ArrayList<>(num);
        for (int i = 0; i < num; i++) {
            CompositeValuesSourceBuilder<?> builder = CompositeValuesSourceParserHelper.readFrom(in);
            sources.add(builder);
        }
        this.sources = Collections.unmodifableList(sources);

}

public SourceConfig(List<CompositeValuesSourceBuilder<?>> sources) {
this.sources = sources;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For immutability:

    this.sources = Collections.unmodifiableList(new ArrayList<>(sources));


@Before
public void setUp() throws Exception {
super.setUp();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to override the base class @Before method and then call the base class version first. JUnit will by default call all the base class @Before methods before the derived class @Before methods, so it would be more usual to just give the derived class @Before method a different name and let JUnit call the two methods in the standard order, avoiding the need to call super.setUp() here.

public static AggregationConfig randonAggregationConfig() {
AggregatorFactories.Builder builder = new AggregatorFactories.Builder();

for (int i = 1; i < randomIntBetween(1, 20); ++i) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid for the number of aggregator factories to be 0? If not maybe the AggregationConfig class itself should validate that it has at least 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, should be for (int i = 0 ...

Nevertheless, AggregationConfig misses validation, will be part of an upcoming PR.

public class SourceConfigTests extends AbstractSerializingFeatureIndexBuilderTestCase<SourceConfig> {

public static SourceConfig randomSourceConfig() {
int numSources = randomIntBetween(1, 10);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid for the number of sources to be 0? If not maybe the SourceConfig class itself should validate that it has at least 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the for-loop below counts from 0, so there will be at least 1 source for this test.

SourceConfig lacks validation at the moment, to be addressed in an upcoming PR.

@AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/33942")
public class AggregationConfigTests extends AbstractSerializingFeatureIndexBuilderTestCase<AggregationConfig> {

public static AggregationConfig randonAggregationConfig() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: randon -> random

indexPattern = in.readString();
destinationIndex = in.readString();
sourceConfig = in.readOptionalWriteable(SourceConfig::new);
aggregationConfig = in.readOptionalWriteable(AggregationConfig::new);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are unnecessarily many levels where null is allowed. You're allowing aggregationConfig to be null here, but also in AggregationConfig aggregatorFactoryBuilder is allowed to be null. I think at most one of these possibilities should be allowed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in AggregationConfig - not aiming for at the moment but in future I see the possibility of this builder to not use aggregations.

}

AggregationConfig(final StreamInput in) throws IOException {
aggregatorFactoryBuilder = in.readOptionalWriteable(AggregatorFactories.Builder::new);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are unnecessarily many levels where null is allowed. You're allowing aggregatorFactoryBuilder to be null here, but also in FeatureIndexBuilderJobConfig aggregationConfig is allowed to be null. I think at most one of these possibilities should be allowed.

@hendrikmuhs hendrikmuhs removed the WIP label Sep 25, 2018
@hendrikmuhs
Copy link
Author

hendrikmuhs commented Sep 25, 2018

@droberts195 Thanks for the review, addressed most of your comments.

For the rest I am planning to tackle them in separate PR's - too many open tasks at the moment.

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool, since it's only going to a feature branch I'm happy to leave everything else for future PRs

@hendrikmuhs hendrikmuhs merged commit 28e426a into elastic:feature/fib Sep 25, 2018
@hendrikmuhs hendrikmuhs deleted the feature/fib-basic-configuration branch October 18, 2018 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :ml Machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants