Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented May 3, 2017

Now that indices have a single type by default, we can move to the next step
and identify documents using their _id rather than the _uid.

One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate DELETE index/foo/1 followed by PUT index/foo/1 from
DELETE index/bar/1 followed by PUT index/foo/1, even though those are
different if versioning is involved.

@jpountz jpountz added :Search Foundations/Mapping Index mappings, including merging and defining field types >enhancement labels May 3, 2017
@jpountz jpountz force-pushed the fix/do_not_include_type_in_uid branch from 3775490 to 038b09e Compare May 3, 2017 14:48
@s1monw s1monw self-requested a review May 4, 2017 15:35
@jpountz jpountz force-pushed the fix/do_not_include_type_in_uid branch 2 times, most recently from 352001a to 57728e2 Compare May 5, 2017 08:56
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left an initial set of comments, engine stuff looks good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a final member to IndexSettings that holds the value of this setting. I don't think it can change in any way, can it?... just checked it's final so lets just make it a first class citizen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you should use the seq number of the request I think.. it's not necessarily a pre 6.0 index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed it will actually be a 6.x index most of the time. I used the seq no of the primary response, is it the right thing to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look very close here but can we keep to to one enum somehow and pass in the field mapper name as a ctor argument? it should stay the same across the lifetime of an index?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make that decision ahead of time when we create the engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make changes around those lines. Information is kept in static threadlocals so we can't really pass the information at construction time, but it should be at least better validated now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more clear what we do here? it's really a wrapper around DV not in-memory FieldData right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the class final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it IS a wrapper around in-memory fielddata. See changes in the docs for instance where I deprecated fielddata access on the _uid and _id fields.

@jpountz
Copy link
Contributor Author

jpountz commented May 9, 2017

@s1monw Thanks for looking, I made some changes.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minors here LGTM otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we assert that it has only one type at most?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this TODO worries me a bit, I think we use this access by default for instance when we slice scrolls? Should we add docvalues to _id as well? I wonder if we should open a new issue and make it a blocker to ensure we make a call here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one already, adding doc values was a bit controversial #11887

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might miss something but we are on the ID field that doesn't have a type? why are we using Uid.createUidsForTypesAndIds(context.queryTypes(), values)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only triggered when indexOptions() == IndexOptions.NONE, which means it is a 5.x index, I'll add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks that was confusing to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have IndexSettings available when you create IdFieldMapper I wonder if we can just pass a boolean down to the ctor if we have more than one type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same what I mentioned above, we have IndexSettings available here can we maybe pass a boolean down? also maybe we should just reference a static method from IdFieldMapper here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a followup I wonder if we can use the seq ID for this now since it has docvalues? That would be a much better default IMO, at least for 6.0.0 indices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there is an expectation that a given seed always gives the same score to a given id, regardless of whether it was updated. I remember Robert arguing that we should just use the Lucene doc id for random score generation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slicing is not about scoring, it's about partitioning the data afaik? with lucene doc IDs you can not resume a slice since you might hit a different replica when you resume. I wonder if that is a different problem here... for random score I agree we might just go down that path by default and if somebody needs reproducibility we can still use something like the ID or seq ids?

jpountz added 3 commits May 9, 2017 14:43
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.

One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
@jpountz jpountz force-pushed the fix/do_not_include_type_in_uid branch from ceda51d to 2a5e282 Compare May 9, 2017 12:43
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM still.. you got a sysout in your last commit that you might wanna remove...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks that was confusing to me

}

public static void main(String[] args) {
System.out.println(precisionFromThreshold(50));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally!

@jpountz jpountz merged commit a72eaa8 into elastic:master May 9, 2017
@jpountz jpountz deleted the fix/do_not_include_type_in_uid branch May 9, 2017 14:33
rjernst added a commit that referenced this pull request May 9, 2017
jpountz added a commit that referenced this pull request May 10, 2017
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.

One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
jpountz added a commit to jpountz/elasticsearch that referenced this pull request Jun 9, 2017
This was introduced in elastic#24460: the constructor of `Translog.Delete` that takes
a `StreamInput` does not set the type and id. To make it a bit more robust, I
made fields final so that forgetting to set them would make the compiler
complain.
jpountz added a commit that referenced this pull request Jun 9, 2017
…4586)

This was introduced in #24460: the constructor of `Translog.Delete` that takes
a `StreamInput` does not set the type and id. To make it a bit more robust, I
made fields final so that forgetting to set them would make the compiler
complain.
jpountz added a commit that referenced this pull request Jun 9, 2017
…4586)

This was introduced in #24460: the constructor of `Translog.Delete` that takes
a `StreamInput` does not set the type and id. To make it a bit more robust, I
made fields final so that forgetting to set them would make the compiler
complain.
jpountz added a commit that referenced this pull request Jun 9, 2017
…4586)

This was introduced in #24460: the constructor of `Translog.Delete` that takes
a `StreamInput` does not set the type and id. To make it a bit more robust, I
made fields final so that forgetting to set them would make the compiler
complain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types v6.0.0-alpha2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants