-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Synthetic source error on script loads #88334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic source error on script loads #88334
Conversation
Refactors SourceLookup into a static source lookup used for most cases where we access the source again after the fetch phase, and a re-loading lookup used by scripts. The re-loading lookup now also fails with an error when we are using synthetic source preventing silent failures or non-sensical behavior from scripts.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
I swore this PR existed somewhere, but checking on the status today I was surprised not to see it. If there is a duplicate somewhere somehow my bad! |
|
Hi @tmgordeeva, I've created a changelog YAML for you. |
|
@romseygeek Nik's out, and I heard you might be able to take a look at this? |
| */ | ||
| public SearchLookup lookup() { | ||
| if (this.lookup == null) { | ||
| SourceFieldMapper sourceMapper = mappingLookup == null ? null : (SourceFieldMapper) mappingLookup.getMapper("_source"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just do boolean isSynthetic = here so you don't have to reason about "null means false"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, rather, "null means true"
| } else { | ||
| Query rewrittenBackground = context.filterQuery(backgroundQuery); | ||
| this.backgroundFilter = rewrittenBackground; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this snuck in from another PR.
| public int docId() { | ||
| return docId; | ||
| } | ||
| int docId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are here, can you add javadoc for these? It'll really help whoever comes after. I know I often don't. But I should...
| import static java.util.Collections.emptyMap; | ||
|
|
||
| public class SourceLookup implements Map<String, Object> { | ||
| public interface SourceLookup extends Map<String, Object> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should/can confine the Map extension to scripts. It'd an odd thing to do outside of script compatibility and this is used so many places. It's probably not worth it for this PR, but it's an interesting question.
| Function<String, MappedFieldType> fieldTypeLookup, | ||
| BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup, | ||
| boolean canReloadSource | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend not to like adding these extra ctors if I can get away without it. Is this temporary? I think we may have talked about this but I'm forgetting.
Either way, probably worth copying the javadoc from above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look it appears that all of the uses of the extra constructor are in test code, which makes me agree with Nik. Let's nuke it - it makes the PR a bit noisier but I think it will save confusion further down the line.
| stub.put(nestedFieldName, entry); | ||
| SourceLookup nestedSourceLookup = new SourceLookup(); | ||
| SourceLookup nestedSourceLookup = new SourceLookup.Static(); | ||
| nestedSourceLookup.setSource(filteredSource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this makes me think "SourceLookup.Static should take the source as a parameter - but I think that'd only work sometimes, right? In the fetch phase we don't have it available right away, I think.
| @@ -0,0 +1,5 @@ | |||
| pr: 88334 | |||
| summary: Synthetic source error on script loads | |||
| area: Geo | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.... I think maybe the thing made a mistake. This isn't in Geo at all.
| } | ||
| } | ||
|
|
||
| public void testFieldBackground() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also snuck in.
| fieldFetcher.setNextReader(readerContext); | ||
| SourceLookup sourceLookup = new SourceLookup(); | ||
| SourceLookup sourceLookup = new SourceLookup.Static(); | ||
| sourceLookup.setSegmentAndDocument(readerContext, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is fun. Static isn't going to be able to do anything with the readerContext or the docId. But this is how we pass those things into the fetching framework, right?
| @SuppressWarnings("rawtypes") | ||
| public void putAll(Map m) { | ||
| throw new UnsupportedOperationException(); | ||
| class Static extends AbstractSourceLookup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These both deserve javadoc please.
I wonder if PreLoaded is a better name than Static. My mind jumps to java's static keyword and I keep thinking "there is only one of this". But static here means something more like "unchanging".
Speaking of, how hard would be be to make the Static version truly immutable? Not worth it in this PR, I'd wager, but maybe actually good?
SourceLookup now takes a SourceProvider which will describe how we expect to access the source. Methods for reading and parsing the source have been split out into the different source providers. It's now more explicit when SourceLookup is expecting to just re-use an existing map or bytes source object.
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @tmgordeeva! I left a few comments but I think we're close.
server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/SignificanceLookup.java
Show resolved
Hide resolved
| slot | ||
| ); | ||
| subContext.sourceLookup().setSource(document); | ||
| subContext.sourceLookup().setSourceProvider(new SourceLookup.BytesSourceProvider(document)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of these days we will get rid of these setters. Ah well, progress not perfection!
| */ | ||
| public SearchLookup lookup() { | ||
| if (this.lookup == null) { | ||
| SourceFieldMapper sourceMapper = mappingLookup == null ? null : (SourceFieldMapper) mappingLookup.getMapper("_source"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a new getSourceProvider() method on MappingLookup? That way we can easily add in future enhancements like a nested source provider.
| Function<String, MappedFieldType> fieldTypeLookup, | ||
| BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup, | ||
| boolean canReloadSource | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look it appears that all of the uses of the extra constructor are in test code, which makes me agree with Nik. Let's nuke it - it makes the PR a bit noisier but I think it will save confusion further down the line.
| Function<String, MappedFieldType> fieldTypeLookup, | ||
| TriFunction<MappedFieldType, Supplier<SearchLookup>, MappedFieldType.FielddataOperation, IndexFieldData<?>> fieldDataLookup | ||
| TriFunction<MappedFieldType, Supplier<SearchLookup>, MappedFieldType.FielddataOperation, IndexFieldData<?>> fieldDataLookup, | ||
| SourceLookup sourceLookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to pass the SourceProvider as a parameter here, rather than the SourceLookup directly
| } | ||
| if (sourceProvider instanceof ReaderSourceProvider) { | ||
| ((ReaderSourceProvider) sourceProvider).setSegmentAndDocument(context, docId); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a setSegmentAndDocument() method to the SourceProvider interface as well.
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there, I would like to double check one thing re script access to _source.
| pr: 88334 | ||
| summary: Synthetic source error on script loads | ||
| area: Geo | ||
| type: bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this area: Search and type: enhancement
|
|
||
| private LeafReader reader; | ||
| private CheckedBiConsumer<Integer, FieldsVisitor, IOException> fieldReader; | ||
| public class SourceLookup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we double-check that removing the Map interface here doesn't break scripts that look things up via _source['foo']? There's a lot of dynamic typing that goes on in scripts and I'm a bit nervous that this will silently cause some issues. Maybe @stu-elastic or @jdconrad have a view? If it works, then that's awesome as having it here has been annoying me for several years now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through this with @stu-elastic, and this looks like this should be a safe change. Everywhere we allow source access we do through a DynamicMap where we call the source method to get the source Map directly instead of using SourceLookup. The one edge case I can see is a custom plugin may have its own specially defined context; however, we don't guarantee any stable api right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to also change the construction of the asMap member of LeafSearchLookup. Although this hasn't caused any test failures, which either means we have a gap in test coverage or that this isn't actually used anymore.
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this merged! Thanks for all the iterations @tmgordeeva
Scripts will now error out if they attempt to load source with synthetic source on. As part of this change, SourceLookup is now refactored to split the cases where we intend to actually load the source (ie fetch phase) and cases where we use a source already provided.