Skip to content

Conversation

@s1monw
Copy link
Contributor

@s1monw s1monw commented Sep 20, 2017

Today we represent each value of a list setting with it's own dedicated key that ends with the index of the value in the list. Aside of the obvious weirdness this has several issues especially if lists are massive since it causes massive runtime penalties when validating settings. Like a list of 100k words will literally cause a create index call to timeout and in-turn massive slowdown on all subsequent validations runs.

This change moves away from the current internal representation towards a single decoded string representation. A list is encoded into a JSON list internally in a fully backwards compatible way. A list is then a single key value pair in settings and all prefix based settings are internally converted to the new representation in the settings builder. This change also forbids to add a settings that ends with a .0 which was internally used to detect a list setting. Once this has been rolled out for an entire major version all the internal .0 handling can be removed since all settings will be converted.

Today we represent each value of a list setting with it's own dedicated key that ends
with the index of the value in the list. Aside of the obvious weirdness this has several
issues especially if lists are massive since it causes massive runtime penatlies when validating
settings. Like a list of 100k words will literally cause a create index call to timeout and in-turn
massive slowdown on all subsequent validations runs. This change moves away from the current internal
representation towards a single decoded string representation. A list is encoded into a JSON list internally
in a fully backwards compatible way. A list is then a single key value pair in settings and all prefix based
settings are internally converted to the new representation in the settings builder.
This change also forbids to add a settings that ends with a `.0` which was internally used to detect a list setting.
Once this has been rolled out for an entire major version all the internal `.0` handling can be removed since
all settings will be converted.
@s1monw s1monw added :Core/Infra/Settings Settings infrastructure and APIs >bug v7.0.0 labels Sep 20, 2017
@s1monw s1monw changed the title Chance format how settings represent lists / array Change format how settings represent lists / array Sep 20, 2017
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it and thought about it some. I've left what I thought. I'll come back and have another look soon.


private static boolean isInternalList(String value) {
return value != null
&& value.length() >= 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be 2. 4 is the minimum number of bytes but length is in chars.

}

private static List<String> maybeGetList( String value) throws IOException {
if (value != null && isInternalList(value)) { // we try to write it as a list if it is a list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "write" is correct here.

* @return The builder
*/
public Builder put(String key, String value) {
if (key.endsWith(".0")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me uncomfortable.... I see that we used to do it as well. Can we get away with not doing it anymore now that we have putArray?

someGroup, someAffix)));
Settings diff = settings.diff(Settings.builder().put("foo.bar", 5).build(), Settings.EMPTY);
assertEquals(4, diff.size()); // 4 since foo.bar.quux has 3 values essentially
assertEquals(2, diff.size()); // 4 since foo.bar.quux has 3 values essentially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the comment anymore.

Settings.builder().put("some.group.foo", 5).build(),
Settings.builder().put("some.group.foobar", 17, "some.group.foo", 25).build());
assertEquals(6, diff.size()); // 6 since foo.bar.quux has 3 values essentially
assertEquals(4, diff.size()); // 6 since foo.bar.quux has 3 values essentially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. These comments were helpful but are not incorrect.

Settings.builder().put("foo.bar", 5).build(),
Settings.builder().put("foo.bar.baz", 17).putArray("foo.bar.quux", "d", "e", "f").build());
assertEquals(4, diff.size()); // 4 since foo.bar.quux has 3 values essentially
assertEquals(2, diff.size()); // 4 since foo.bar.quux has 3 values essentially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

Settings.builder().put("some.prefix.foobar.somekey", 17,
"some.prefix.foo.somekey", 18).build());
assertEquals(6, diff.size()); // 6 since foo.bar.quux has 3 values essentially
assertEquals(4, diff.size()); // 6 since foo.bar.quux has 3 values essentially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

@rjernst
Copy link
Member

rjernst commented Sep 26, 2017

When discussing this many weeks ago, I suggested having the value of the internal map be a discriminated union. This could either simply be Object with the discrimination done by instanceof checks, or a simple POJO with a boolean+Object. I'm apprehensive about this PR because it means every iteration of a list setting requires creation of new strings (by parsing the "list" string). Why are these special marker strings to denote whether the value is a list any better than actually having a List and using instanceof? In either case, we have to ensure access to the internal map is removed, but the latter is much more transparent and easier to not mess up.

@s1monw
Copy link
Contributor Author

s1monw commented Sep 27, 2017

I am closing this one since the encapsulation of Settings is so broken today I first have to clean up all the cruft around it to make it work. I think ultimately we can just have a String,Object map internally but before that we have to get rid of things like getAsMap on settings.

@s1monw s1monw closed this Sep 27, 2017
@s1monw s1monw deleted the fix_list_settings branch October 4, 2017 07:24
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Oct 4, 2017
Today we represent each value of a list setting with it's own dedicated key
that ends with the index of the value in the list. Aside of the obvious
weirdness this has several issues especially if lists are massive since it
causes massive runtime penalties when validating settings. Like a list of 100k
words will literally cause a create index call to timeout and in-turn massive
slowdown on all subsequent validations runs.

With this change we use a simple string list to represent the list. This change
also forbids to add a settings that ends with a .0 which was internally used to
detect a list setting.  Once this has been rolled out for an entire major
version all the internal .0 handling can be removed since all settings will be
converted.

Relates to elastic#26723
s1monw added a commit that referenced this pull request Oct 5, 2017
Today we represent each value of a list setting with it's own dedicated key
that ends with the index of the value in the list. Aside of the obvious
weirdness this has several issues especially if lists are massive since it
causes massive runtime penalties when validating settings. Like a list of 100k
words will literally cause a create index call to timeout and in-turn massive
slowdown on all subsequent validations runs.

With this change we use a simple string list to represent the list. This change
also forbids to add a settings that ends with a .0 which was internally used to
detect a list setting.  Once this has been rolled out for an entire major
version all the internal .0 handling can be removed since all settings will be
converted.

Relates to #26723
s1monw added a commit that referenced this pull request Oct 5, 2017
Today we represent each value of a list setting with it's own dedicated key
that ends with the index of the value in the list. Aside of the obvious
weirdness this has several issues especially if lists are massive since it
causes massive runtime penalties when validating settings. Like a list of 100k
words will literally cause a create index call to timeout and in-turn massive
slowdown on all subsequent validations runs.

With this change we use a simple string list to represent the list. This change
also forbids to add a settings that ends with a .0 which was internally used to
detect a list setting.  Once this has been rolled out for an entire major
version all the internal .0 handling can be removed since all settings will be
converted.

Relates to #26723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Core/Infra/Settings Settings infrastructure and APIs v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants