Skip to content

Conversation

@tmgordeeva
Copy link
Contributor

Scripts will now error out if they attempt to load source with synthetic source on. As part of this change, SourceLookup is now refactored to split the cases where we intend to actually load the source (ie fetch phase) and cases where we use a source already provided.

Tatyana Gordeeva added 2 commits June 27, 2022 19:49
Refactors SourceLookup into a static source lookup used for most cases where we
access the source again after the fetch phase, and a re-loading lookup used by
scripts. The re-loading lookup now also fails with an error when we are using
synthetic source preventing silent failures or non-sensical behavior from
scripts.
@tmgordeeva tmgordeeva added the :Analytics/Geo Indexing, search aggregations of geo points and shapes label Jul 7, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 7, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@tmgordeeva
Copy link
Contributor Author

I swore this PR existed somewhere, but checking on the status today I was surprised not to see it. If there is a duplicate somewhere somehow my bad!

@tmgordeeva tmgordeeva added the >bug label Jul 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @tmgordeeva, I've created a changelog YAML for you.

@tmgordeeva
Copy link
Contributor Author

@romseygeek Nik's out, and I heard you might be able to take a look at this?

*/
public SearchLookup lookup() {
if (this.lookup == null) {
SourceFieldMapper sourceMapper = mappingLookup == null ? null : (SourceFieldMapper) mappingLookup.getMapper("_source");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just do boolean isSynthetic = here so you don't have to reason about "null means false"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, rather, "null means true"

} else {
Query rewrittenBackground = context.filterQuery(backgroundQuery);
this.backgroundFilter = rewrittenBackground;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this snuck in from another PR.

public int docId() {
return docId;
}
int docId();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are here, can you add javadoc for these? It'll really help whoever comes after. I know I often don't. But I should...

import static java.util.Collections.emptyMap;

public class SourceLookup implements Map<String, Object> {
public interface SourceLookup extends Map<String, Object> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should/can confine the Map extension to scripts. It'd an odd thing to do outside of script compatibility and this is used so many places. It's probably not worth it for this PR, but it's an interesting question.

Function<String, MappedFieldType> fieldTypeLookup,
BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup,
boolean canReloadSource
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend not to like adding these extra ctors if I can get away without it. Is this temporary? I think we may have talked about this but I'm forgetting.

Either way, probably worth copying the javadoc from above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look it appears that all of the uses of the extra constructor are in test code, which makes me agree with Nik. Let's nuke it - it makes the PR a bit noisier but I think it will save confusion further down the line.

stub.put(nestedFieldName, entry);
SourceLookup nestedSourceLookup = new SourceLookup();
SourceLookup nestedSourceLookup = new SourceLookup.Static();
nestedSourceLookup.setSource(filteredSource);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this makes me think "SourceLookup.Static should take the source as a parameter - but I think that'd only work sometimes, right? In the fetch phase we don't have it available right away, I think.

@@ -0,0 +1,5 @@
pr: 88334
summary: Synthetic source error on script loads
area: Geo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... I think maybe the thing made a mistake. This isn't in Geo at all.

}
}

public void testFieldBackground() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also snuck in.

fieldFetcher.setNextReader(readerContext);
SourceLookup sourceLookup = new SourceLookup();
SourceLookup sourceLookup = new SourceLookup.Static();
sourceLookup.setSegmentAndDocument(readerContext, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is fun. Static isn't going to be able to do anything with the readerContext or the docId. But this is how we pass those things into the fetching framework, right?

@SuppressWarnings("rawtypes")
public void putAll(Map m) {
throw new UnsupportedOperationException();
class Static extends AbstractSourceLookup {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These both deserve javadoc please.

I wonder if PreLoaded is a better name than Static. My mind jumps to java's static keyword and I keep thinking "there is only one of this". But static here means something more like "unchanging".

Speaking of, how hard would be be to make the Static version truly immutable? Not worth it in this PR, I'd wager, but maybe actually good?

@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:05
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
Tatyana Gordeeva added 5 commits July 27, 2022 10:06
SourceLookup now takes a SourceProvider which will describe how we expect to
access the source. Methods for reading and parsing the source have been split
out into the different source providers. It's now more explicit when
SourceLookup is expecting to just re-use an existing map or bytes source
object.
@nik9000 nik9000 requested a review from romseygeek July 29, 2022 18:28
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @tmgordeeva! I left a few comments but I think we're close.

slot
);
subContext.sourceLookup().setSource(document);
subContext.sourceLookup().setSourceProvider(new SourceLookup.BytesSourceProvider(document));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of these days we will get rid of these setters. Ah well, progress not perfection!

*/
public SearchLookup lookup() {
if (this.lookup == null) {
SourceFieldMapper sourceMapper = mappingLookup == null ? null : (SourceFieldMapper) mappingLookup.getMapper("_source");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a new getSourceProvider() method on MappingLookup? That way we can easily add in future enhancements like a nested source provider.

Function<String, MappedFieldType> fieldTypeLookup,
BiFunction<MappedFieldType, Supplier<SearchLookup>, IndexFieldData<?>> fieldDataLookup,
boolean canReloadSource
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look it appears that all of the uses of the extra constructor are in test code, which makes me agree with Nik. Let's nuke it - it makes the PR a bit noisier but I think it will save confusion further down the line.

Function<String, MappedFieldType> fieldTypeLookup,
TriFunction<MappedFieldType, Supplier<SearchLookup>, MappedFieldType.FielddataOperation, IndexFieldData<?>> fieldDataLookup
TriFunction<MappedFieldType, Supplier<SearchLookup>, MappedFieldType.FielddataOperation, IndexFieldData<?>> fieldDataLookup,
SourceLookup sourceLookup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to pass the SourceProvider as a parameter here, rather than the SourceLookup directly

}
if (sourceProvider instanceof ReaderSourceProvider) {
((ReaderSourceProvider) sourceProvider).setSegmentAndDocument(context, docId);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a setSegmentAndDocument() method to the SourceProvider interface as well.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, I would like to double check one thing re script access to _source.

pr: 88334
summary: Synthetic source error on script loads
area: Geo
type: bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this area: Search and type: enhancement


private LeafReader reader;
private CheckedBiConsumer<Integer, FieldsVisitor, IOException> fieldReader;
public class SourceLookup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we double-check that removing the Map interface here doesn't break scripts that look things up via _source['foo']? There's a lot of dynamic typing that goes on in scripts and I'm a bit nervous that this will silently cause some issues. Maybe @stu-elastic or @jdconrad have a view? If it works, then that's awesome as having it here has been annoying me for several years now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through this with @stu-elastic, and this looks like this should be a safe change. Everywhere we allow source access we do through a DynamicMap where we call the source method to get the source Map directly instead of using SourceLookup. The one edge case I can see is a custom plugin may have its own specially defined context; however, we don't guarantee any stable api right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to also change the construction of the asMap member of LeafSearchLookup. Although this hasn't caused any test failures, which either means we have a gap in test coverage or that this isn't actually used anymore.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this merged! Thanks for all the iterations @tmgordeeva

@tmgordeeva tmgordeeva merged commit db5ddb3 into elastic:main Aug 26, 2022
@tmgordeeva tmgordeeva deleted the fix/synthetic-source-scripts branch August 26, 2022 18:32
@nik9000 nik9000 mentioned this pull request Sep 12, 2022
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants