-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Sketch out source value parsing. #56473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sketch out source value parsing. #56473
Conversation
|
Pinging @elastic/es-search (:Search/Search) |
689a120 to
fe4ad73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this PR doesn't handle the null_value mapping options. Currently null values are treated as if the field doesn't exist.
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. I'm not entirely happy with the slippery nature of the instanceofs and the parsing, but at least on the parsing side I don't think we can do anything about it.
I still wish we could do this in "primitive aware" way so we didn't pass back Number objects. It isn't important for your feature, but it'll be more important for mine later on down the road. But we can adapt when we get there. No need to bend over backwards to design for that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a and b instead of first and second. When I read this I had to realize that we're relying on "first" being alphabetically before "second". That is convenient at all, but distracting to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to declare methods as either final or abstract mostly because I keep bumping into situations where things that override "big" methods do a lot of copy and pasting or are just otherwise confusing. You certainly don't have to do it, but if you can build this so it is final and things that need to customize the logic more can do so via hooks or something I'd feel better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in agreement, this structure felt tricky to me. For context, the only field mapper that will override lookupValues directly is ConstantKeywordFieldMapper.
I played around with several alternatives but couldn't find a better approach. Do you have any suggestions (I'm wondering how we could be able to make this final)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not always a huge fan of extension through inheriance, but maybe this should be implemented in a FieldMapper that ConstantKeywordFieldMapper doesn't extend from and everything else does? I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea if it is better, but it is a thing I might try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can avoid the instanceof somehow? I feel like we have so so so many of these already and they drive me crazy. They make it so hard to read classes and know what is up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayValueMapperParser is a marker interface whose sole purpose to indicate to the parsing logic that FieldMapper#parse handles array values, and that they shouldn't be split up. I liked how this logic mirrored the document parsing logic in how it handled array values.
After you mentioned this, I looked into refactoring how we handle array-value parsing in general so we don't rely on mapper instanceof ArrayValueMapperParser checks. I wasn't able to refactor it successfully because of a complexity with copy_to fields. (In general I've been finding it really tricky to make incremental improvements to the document parsing code!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bleh! Well, adding this won't make it worse then! Marker interfaces are not my friend! But it isn't like you can do anything about it now.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! I get why we're doing this but wow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was not a proud moment 😊 Maybe I can instead introduce a method like GeoUtils.isSingleGeoPoint. I'll need general guidance from the geo experts on the best way to return geo data, I'll make sure to loop them in on the follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to explicitly return Number from the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 in the follow-up PR I can make sure these return the most specific type possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
And specifically around the approach - I'm happy with it. I do think it'd be cleaner to be able to deal in |
4de0916 to
e2f206f
Compare
fe4ad73 to
c362eea
Compare
This draft PR shows an approach for parsing out source values. It adds new method to
FieldMapper#lookupValues(SourceLookup)to let field mappers decide how to extract and parse the source values. This lets us return values like numbers and dates in a consistent format, and also handle special data types likeconstant_keyword. ThelookupValuesmethod calls intoparseSourceValue, which mappers can override to specify how values should be parsed.This PR is just to get feedback on the design and doesn’t handle all field types. I’ll submit a complete PR after, that addresses questions like 'should keywords be normalized?' and 'should geopoints be returned in a standard format?'
One aspect I didn’t like about this approach is that we’re adding yet another method to
FieldMapperthat specifies how stored values should be returned. For context, there’s alreadyvalueForDisplayfor stored fields, anddocValueFormatfor doc values.Sharing logic with document parsing?
I was excited about the idea of sharing logic between
FieldMapper#parseCreateFieldand this new methodparseSourceValue. This makes sense conceptually -- to return the values from source, we’re essentially re-parsing the document as if we’re ingesting it, and returning the values instead of creating Lucene fields. I could imagineparseCreateFieldbeing replaced by two methods, something like the following:So the
parseSourceValuemethod would be used both for indexing, and also when loading values from source. This has the nice benefit of removing duplication between field loading and document parsing. During indexing we make some subtle choices as to how to parse values -- for example boolean types interpret the empty string "" as 'false'. Unifying this logic would make differences in parsing less likely and seems more maintainable.I tried out this idea, and it didn’t turn out as nicely as I was hoping:
NumberFieldType.NumberType#parsefor example). There is also a lot of extra logic inparseCreateFieldthat makes reuse hard, for example handling external values.XContentParser. The methodRangeFieldMapper#parseSourceValueclass gives a good example of this, it would be harder to work with a parser.XContentParserfromSourceLookup(we could change this though, it’s just not access pattern it currently expects).Perhaps we could keep this refactor in mind as a future improvement. It could be more do-able with simplifications to parsing and more work on the
SourceLookupinterface.